Simulating and reconstructing genome evolution in vertebrates
Genomes evolve through a combination of forces that are difficult to study and conceptualise. Yet the field is maturing to a point where we can now start to build models of genome evolution that capture our current understanding of chromosome and genome evolution. We have built such a model of genome evolution that accounts for chromosome rearrangements (fusions and fissions, inversions, translocations) and gene events (appearances, duplications, deletions), and we have estimated the rates at which these changes might have taken place in real evolution. We have implemented this model and these estimated rates in a computer simulator called Magsimus, and we have simulated the evolution of the chicken, opossum, dog, mouse and human genomes from a common Amniote ancestor. Using this approach, we show that it is possible to rapidly converge to optimal rates that realistically reproduce properties of modern genomes. Finally, equipped with Magsimus as a realistic simulator of genome evolution, we benchmarked an algorithm called AGORA that reconstructs ancestral vertebrate genomes. These developments provide a new framework to test evolutionary hypotheses and evaluate how incomplete our understanding of genome evolution may be.
This is joint work with Joseph Lucas, Lucas Tittmann and Matthieu Muffato.
Navigating Incongruence and Uncertainty in Genome-scale Phylogenetic and Ancestral State Reconstructions of Enterobacteria
A robust phylogenetic framework is a cornerstone of evolutionary analyses that illuminates myriad biological questions. Extensive horizontal gene transfer in some groups of bacteria can be a significant confounding factor in detection of a predominant phylogenetic history representing even the majority of a given set of genomes. Regions of the genomes or individual genes with alternate evolutionary histories can also provide biological insights. Taxon sampling biases and inherent data limitations for individual genes can make it challenging to distinguish between true incongruence and uncertainty. Here, we present comparative genomic analyses of enterobacteria, a ubiquitous model family with members free-living in soil, water and air, and associated with hosts as diverse as humans and potatoes. Through a combination of orthology prediction, gene-by-gene, total evidence, coalescent-based, and quartet-based approaches to phylogenetic reconstruction, coupled with inferences on ancestral genome content, metabolic and regulatory networks, we dissect the genome-wide evolutionary history of enterobacteria with the ultimate goal of understanding the emergence of complex traits such as host-range, pathogenicity and responses to abiotic environmental parameters like oxygen availability.
Hierarchical Orthologous Groups: a unifying and scalable framework for large-scale gene evolution
Sequencing data are rapidly piling up. Because many genes are highly conserved in sequence and function across different species—in some cases despite billions of years of intervening evolution—knowledge painstakingly gleaned through experiments can often be propagated across evolutionarily related genes. In theory, the more we know about the sequence universe, the easier elucidating these evolutionary relationships should get. Frustratingly however, the opposite seems true in practice: dealing with multiple species is conceptually and practically challenging, and as a result many evolutionary analyses remain stuck in a “two-species at a time” paradigm or only consider single-copy genes across multiple species. To overcome this impasse, I’ll introduce the concept of Hierarchical Orthologous Groups (HOGs)—nested groups of genes descending from a single ancestral gene within clades of interest. I’ll present an algorithm to accurately and efficiently infer HOGs. I will show how HOGs can be used to reconstruct ancestral genomes and to propagate functional knowledge from model species to non-model species.
Strain level microbial comparative genomics using shotgun metagenomics
Microbial comparative genomics is now routinely performed by sequencing the genome of target microorganisms cultivated in vitro. Advances in genome assembly and analysis approaches are providing very accurate tools for detecting genomic features (genes, SNPs, motifs, repeats) associated with the conditions of interest and with epidemiology patterns. This approach is however time consuming and generally applicable to only a fraction of human-associated microbes (typically pathogens) for which a known cultivation protocol is available. Shotgun metagenomics, on the other hand, provides an untargeted snapshot of the whole microbial diversity populating an environment, but accurate taxonomic profiling is still a challenging task. I will present here new methods we developed that are able to profile microbes from metagenomes with strain level resolution. This enables cultivation-free microbial population genomics and epidemiology studies using the several thousands of publicly available metagenomes. I will describe the novel computational framework for “meta-epidemiology” and discuss several results on human-associated commensals and opportunistic pathogens.
Investigating meiosis in allohexaploid Brassica
The Brassica genus contains many agriculturally significant crop species with interesting genomic relationships. Three diploid species (cabbage, turnip and black mustard; 2n = CC, AA and BB respectively) gave rise to allotetraploid species Indian mustard, Ethiopian mustard and oilseed rape (2n = AABB, BBCC and AACC). Although allohexaploid Brassica (2n = AABBCC) does not exist in nature, combining the three genomes may allow production of a new, vigorous hybrid crop species, as well as allowing investigation of polyploid speciation processes. Genome stability and meiotic behaviour were investigated in different allohexaploid Brassica populations using a high-throughput molecular genotyping approach. A, B and C-genome allele inheritance and meiotic interactions between the three genomes were quantified. Identification of the genetic and genomic factors underlying control of meiosis in Brassica will assist in production of a new, stable allohexaploid crop species.
Impacts of polyploidy on genome evolution in Brassica species
The triangle of U describes the relationship between six species of the Brassica genus, where three lower chromosome number species (Brassica rapa, Brassica oleracea and Brassica nigra) have hybridized in all combinations to generate three allopolyploid species (Brassica napus, Brassica juncea and Brassica carinata). Each of the lower chromosome number species are paleopolyploids having evolved through multiple rounds of genome duplication and subsequent diploidisation. Complete genome sequences have been generated for each of these Brassica species and comparative analyses among them are facilitated by alignment to a common ancestral Brassicaceae genome. The relationship among these species offers a unique opportunity to study the impact of multiple polyploidisation events on the evolution of genome structure, in particular maintenance or fractionation of duplicated gene complements, the potential for homoeologous chromosomal exchange in the allopolyploid species, and the consequent influence of these events on adaptive traits.