Interpreting mammalian evolutionary constraint at synonymous sites in light of the unwanted transcript hypothesis

Matthew J. Christmas,Michael Dong,Jennifer R. S. Meadows,Sergey V. Kozyrev,Kerstin Lindblad-Toh
DOI: https://doi.org/10.1101/2024.04.23.590689
2024-04-26
Abstract:The unwanted transcript hypothesis presents a potential explanation for cryptic evolutionary constraint at synonymous sites in species with low effective population sizes, such as humans and other mammals. Selection for higher GC content and against mutations that alter splicing in native transcripts is predicted to shape synonymous site content and protect against unwanted transcripts. Here, we interpret mammalian synonymous site constraint in this context. Utilising the largest alignment of 240 placental mammal genomes and single-base resolution constraint scores, we show that 20.8% of four-fold degenerate sites are under significant constraint across mammals. There is a strong bias for guanine (G) and cytosine (C) at constrained sites, marked constraint near splice sites, and variation in human populations shows a bias against mutations that reduce synonymous site GC content. We find evidence for higher constraint on four-fold degenerate sites in species with small historic effective population sizes and high young transposable element genome content. Genes enriched for synonymous site constraint, including those forming CpG sites, are tightly regulated and integral to organismal viability through their involvement in embryo development and transcriptional regulation.
Evolutionary Biology
What problem does this paper attempt to address?