Phylogenetic signal in primate tooth enamel proteins and its relevance for paleoproteomics

Ricardo Fong Zazueta,Johanna Krueger,David M. Alba,Xènia Aymerich,Robin M. D. Beck,Enrico Cappellini,Guillermo Carrillo Martín,Omar Cirilli,Nathan Clark,Omar E. Cornejo,Kyle Kai-How Farh,Luis Ferrández-Peral,David Juan,Joanna L. Kelley,Lukas F. K. Kuderna,Jordan Little,Joseph D. Orkin,Ryan S. Paterson,Harvinder Pawar,Tomas Marques-Bonet,Esther Lizano
DOI: https://doi.org/10.1101/2024.02.28.580462
2024-02-29
Abstract:Ancient tooth enamel, and to some extent dentin and bone, contain characteristic peptides that persist for long periods of time. In particular, peptides from the enamel proteome (enamelome) have been used to reconstruct the phylogenetic relationships of fossil specimens and to estimate divergence times. However, the enamelome is based on only about 10 genes, whose protein products undergo fragmentation . Moreover, some of the enamelome genes are paralogous or may coevolve. This raises the question as to whether the enamelome provides enough information for reliable phylogenetic inference. We address these considerations on a selection of enamel-associated proteins that has been computationally predicted from genomic data from 232 primate species. We created multiple sequence alignments (MSAs) for each protein and estimated the evolutionary rate for each site and examined which sites overlap with the parts of the protein sequences that are typically isolated from fossils. Based on this, we simulated ancient data with different degrees of sequence fragmentation, followed by phylogenetic analysis. We compared these trees to a reference species tree. Up to a degree of fragmentation that is similar to that of fossil samples from 1-2 million years ago, the phylogenetic placements of most nodes at family level are consistent with the reference species tree. We found that the composition of the proteome influences the phylogenetic placement of Tarsiiformes. For the inference of molecular phylogenies based on paleoproteomic data, we recommend characterizing the evolution of the proteomes from the closest extant relatives to maximize the reliability of phylogenetic inference.
Evolutionary Biology
What problem does this paper attempt to address?