PhyloMCL: Accurate Clustering of Hierarchical Orthogroups Guided by Phylogenetic Relationship and Inference of Polyploidy Events

Shengyu Zhou,Yamao Chen,Chunce Guo,Ji Qi
DOI: https://doi.org/10.1111/2041-210x.13401
2020-01-01
Methods in Ecology and Evolution
Abstract:Identification of homology relationships is essential for inferring gene functions, detecting phylogeny of gene families, discovering evolutionary history of life, and usually, is the first step of many genetic and genomic studies. However, the presence of gene duplicates, variation on evolutionary rates of homologs, fusion and fission of genes, can lead to misidentification of evolutionary relationships among homologs. Here we provide a Markov clustering based method called PhyloMCL to accurately detect hierarchical orthogroups (HOGs) including orthologs and paralogs, which derived from duplications subsequent to speciation of involved species, by considering both phylogenetic relationship of organisms and effects of polyploidy events. Its performance, evaluated by a list of benchmark gene families, when applying to the clustering of HOGs from 12 Metazoan genomes, reaches up to 87.8% and 83.2% on recall and precision rates respectively. Further application of PhyloMCL on classification of tens of thousands of paralogs, yielded by multiple polyploidy events during evolution of seed plants, successfully identifies the majority of in-/out-paralogs at different taxonomic levels. Benefiting from the strategy of Markov clustering and guidance of species tree, PhyloMCL can accurately classify millions of homologous genes with affordable time, meeting the challenge of phylogenomic studies upon rapid increasing of sequenced genomes.
What problem does this paper attempt to address?