GageTracker: a tool for dating gene age by micro- and macro-synteny with high speed and accuracy

Chengchi Fang,Chuan Dong,Cheng Wang,Fan Xiong,Suxiang Lu,Wenyu Fang,Tong Li,Xiaoni Gan,Liandong Yang,Honghui Zeng,Shunping He
DOI: https://doi.org/10.1101/2024.08.28.610050
2024-08-29
Abstract:With the advent of the Earth Genome Project, an increasing number of species' genomes presents exciting opportunities for exploring genetic and phenotypic diversity in organisms. Determining the origin time of genes facilitates the elucidation of crucial genetic mechanisms underlying significant biological evolutionary questions such as the transition from aquatic to terrestrial life, the emergence of mammals, the origin of humans, as well as the development of species- or lineage-specific traits. However, accurately determining the origin time of these genes in species separated by long evolutionary distances remains a major challenge in bioinformatics as these genes often undergo significant changes in their genome sequences, making it difficult to trace them back to their origin. Here, we proposed a new approach for dating gene age based on the micro- and macro-synteny algorithms. This approach employs the parallel computation of orthologous genome alignments across multiple species. Our method was integrated into the GageTracker (Gene Age Tracker) software, providing a fast and accurate way to trace gene age with minimal user input, available at https://github.com/RiversDong/GageTracker. Benchmarked against the simMammals dataset (Alignathon), GageTracker achieved the same high-quality genome alignments as the optimized LastZ aligner, but improved operation speed by 1.4-7 times. In a separate analysis of 12 Drosophila genomes, GageTracker efficiently assessed the ages of 23,720 genes (including ~13,965 protein-coding genes) in just ~22 hours with default parameters. When comparing with the GenTree database (recognized as the most comprehensive and accurate tool for evaluating gene age), GageTracker achieved an impressive ~94.4% accuracy and ~99% macro consistency in assessing the age of protein-coding genes. Moreover, for the ~5.6% conflicting genes, GageTracker displayed slightly higher support rates than GenTree, as evidenced by data from OrthoDB, FlyBase, and Ensembl ortholog databases. Notably, younger genes identified by GageTracker exhibited a preferential expression pattern in the testis, further reinforcing the reliability of GageTracker in accurately tracing gene age.
Bioinformatics
What problem does this paper attempt to address?