A phylogenetic method linking nucleotide substitution rates to rates of continuous trait evolution
Patrick Gemmell,Timothy B. Sackton,Scott V. Edwards,Jun S. Liu
DOI: https://doi.org/10.1371/journal.pcbi.1011995
2024-04-26
PLoS Computational Biology
Abstract:Genomes contain conserved non-coding sequences that perform important biological functions, such as gene regulation. We present a phylogenetic method, PhyloAcc-C, that associates nucleotide substitution rates with changes in a continuous trait of interest. The method takes as input a multiple sequence alignment of conserved elements, continuous trait data observed in extant species, and a background phylogeny and substitution process. Gibbs sampling is used to assign rate categories (background, conserved, accelerated) to lineages and explore whether the assigned rate categories are associated with increases or decreases in the rate of trait evolution. We test our method using simulations and then illustrate its application using mammalian body size and lifespan data previously analyzed with respect to protein coding genes. Like other studies, we find processes such as tumor suppression, telomere maintenance, and p53 regulation to be related to changes in longevity and body size. In addition, we also find that skeletal genes, and developmental processes, such as sprouting angiogenesis, are relevant. Biologists hope to use data from diverse species to identify the genetic basis of continuous traits such as lifespan or beak shape. To do so, they need methodologies that relate genotypic and phenotypic evolution, while taking account of the relationship between species. The practice of integrating data from many species in this systematic way is relatively new, and existing approaches to the problem are often ad hoc, focus on protein coding genes, or involve discretizing continuous measurements. We avoid these limitations and develop a statistical model and software package that can be used to rapidly analyze alignments with respect to a continuous trait. Our method is illustrated by describing 136,859 conserved non-coding elements from 61 mammalian species with respect to the trait 'long-lived and large-bodied'. We report on the loci highlighted by our model and describe how our results compare to recent studies taking other methodological approaches. We suggest approaches like ours are an important step towards realizing the potential of data collected from across the animal kingdom, whether the aim is to increase our understanding of natural history or to better understand human biology.
biochemical research methods,mathematical & computational biology