OrthoHMM: Improved Inference of Ortholog Groups using Hidden Markov Models

Jacob L Steenwyk,Thomas J. Buida,Antonis Rokas,Nicole King
DOI: https://doi.org/10.1101/2024.12.07.627370
2024-12-12
Abstract:Accurate orthology inference is essential for comparative genomics and phylogenomics. However, orthology inference is challenged by sequence divergence, which is pronounced among anciently diverged organisms. We present OrthoHMM, an algorithm that infers orthologous gene groups using Hidden Markov Models parameterized from substitution matrices, which enables better detection of remote homologs. Benchmarking indicates OrthoHMM outperforms currently available methods; for example, using a curated set of Bilaterian orthogroups, OrthoHMM showed a 10.3 - 138.9% improvement in precision. Rank-based benchmarking using Bilaterian orthogroups and a novel dataset of orthogroups from organisms in three major eukaryotic kingdoms revealed OrthoHMM had the best overall performance (6.7 - 97.8% overall improvement). These findings suggest that Hidden Markov Models improve orthogroup inference.
Bioinformatics
What problem does this paper attempt to address?