Factors influencing the accuracy and precision in dating single gene trees
Guillaume Louvel,Hugues Roest Crollius
DOI: https://doi.org/10.1101/2020.08.24.264671
2024-10-28
Abstract:Molecular dating is the inference of divergence time from genetic sequences. Knowing the time of appearance of a taxon sets the evolutionary context by connecting it with past ecosystems and species. Knowing the divergence times of gene lineages would provide a context to understand adaptation at the genomic level. However, molecular clock inference faces uncertainty due to the variability of the rate of substitution between species, between genes and between sites within genes. When dating speciations, per-lineage rate variability can be informed by fossil calibrations, and gene-specific rates can be either averaged out or modeled by concatenating multiple genes. By contrast when dating gene-specific events, fossil calibrations only inform about speciation nodes and concatenation does not apply to divergences other than speciations. This study aims at benchmarking the accuracy of molecular dating applied to single gene trees, and identify how it is affected by gene tree characteristics. We analyze 5205 alignments of genes from 21 Primates in which no duplication or loss is observed. We also simulated alignments based on characteristics from Primates under a relaxed clock model, to analyze the dating accuracy. Divergence times were estimated with the bayesian program Beast2. From the empirical dataset, we find that the date estimates deviate more from the median age with shorter alignments, high rate heterogeneity between branches and low average rate, features that underlie the amount of dating information in alignments, hence statistical power. The smallest deviation is associated with core biological functions such as ATP binding, cellular organization and anatomical development, categories that are expected to be under strong negative selection.We then investigated the accuracy of dating with simulated alignments, by controlling the three above parameters separately. It confirmed the factors of precision, but also revealed biases when branch rates are highly heterogeneous. This suggests that in the case of the relaxed uncorrelated molecular clock, biases arise from the tree prior when calibrations are lacking and rate heterogeneity is high. Our study finally reports the scale of the gene tree features that influence the dating consistency with median ages, so that comparisons can be made with other genes and taxa. To tackle the molecular dating of events only observed in single gene trees, like deep coalescence, horizontal gene transfers and gene duplications, future models should overcome the lack of power due to limited information from single genes.
Genomics
What problem does this paper attempt to address?