The GC-content at the 5’ends of human protein-coding genes is undergoing mutational decay

Yi Qiu,Yoon Mo Kang,Christopher Korfmann,Fanny Pouyet,Andrew Eckford,Alexander F. Palazzo
DOI: https://doi.org/10.1101/2024.03.12.584636
2024-03-14
Abstract:In vertebrates, most protein-coding genes have a peak of GC-content near their 5’ transcriptional start site (TSS). This feature promotes both the efficient nuclear export and translation of mRNAs. Despite the importance of GC-content for RNA metabolism, its general features, origin, and maintenance remain mysterious. We investigated the evolutionary forces shaping GC-content at the transcriptional start site (TSS) of genes through both comparative genomic analysis of nucleotide substitution rates between different species and by examining human mutations. Our data suggests that GC-peaks at TSSs were present in the last vertebrate common ancestor and are largely dictated by recombination patterns. We observe that in primates and rodents, where recombination is directed away from TSSs by PRDM9, GC-content at protein-coding gene TSSs is currently undergoing mutational decay. In canids, which lack PRDM9 and perform recombination at TSSs, GC-content at protein-coding gene TSSs is increasing. These patterns extend into the open reading frame affecting protein-coding regions, and we show that changes in GC-content due to recombination affect synonymous codon position choices at the start of the open reading frame. Our results indicate that although high GC-content in protein-coding genes may be shaped by selective pressures to enhance expression, the dynamics of GC-content in mammals are largely shaped by patterns of recombination.
Genomics
What problem does this paper attempt to address?