Hierarchical Orthologous Groups: a unifying and scalable framework for large-scale gene evolution
Sequencing data are rapidly piling up. Because many genes are highly conserved in sequence and function across different species—in some cases despite billions of years of intervening evolution—knowledge painstakingly gleaned through experiments can often be propagated across evolutionarily related genes. In theory, the more we know about the sequence universe, the easier elucidating these evolutionary relationships should get. Frustratingly however, the opposite seems true in practice: dealing with multiple species is conceptually and practically challenging, and as a result many evolutionary analyses remain stuck in a “two-species at a time” paradigm or only consider single-copy genes across multiple species. To overcome this impasse, I’ll introduce the concept of Hierarchical Orthologous Groups (HOGs)—nested groups of genes descending from a single ancestral gene within clades of interest. I’ll present an algorithm to accurately and efficiently infer HOGs. I will show how HOGs can be used to reconstruct ancestral genomes and to propagate functional knowledge from model species to non-model species.