Allele age estimators designed for whole genome datasets show only a modest decrease in accuracy when applied to whole exome datasets

Alyssa Pivirotto,Noah Peles,Jody Hey
DOI: https://doi.org/10.1101/2024.02.01.578465
2024-02-06
Abstract:Personalized genomics in the healthcare system is becoming increasingly accessible as the costs of sequencing decreases. With the increase in number of genomes, larger numbers of rare variants are being discovered and much work is being done to identify their functional impacts in relation to disease phenotypes. One way to characterize these variants is to estimate the time the mutation entered the population. However, allele age estimators such as Relate, Genealogical Estimator of Variant Age, and time of coalescence, were developed based on the assumption that datasets include the entire genome. We examined the performance of each of these estimators on simulated exome data under a neutral constant population size model and found that each provides usable estimates of allele age from whole-exome datasets. To test the robustness of these methods, analyses were undertaken to simulate data under a population expansion model and background selection. Relate performs the best amongst all three estimators with Pearson coefficients of 0.64 and 0.68 (neutral constant and expansion population model) with a 17 percent and 15 percent drop in accuracy between whole genome and whole exome estimations. Of the three estimators, Relate is best able to parallelize to yield quick results with little resources, however even Relate is only able to scale to thousands of samples making it unable to match the hundreds of thousands of samples being currently released. While more work is needed to expand the capabilities of current methods of estimating allele age, these methods estimate the age of mutations with a modest decrease in performance.
Genomics
What problem does this paper attempt to address?