The impact of codon choice on translation process in Saccharomyces cerevisiae: folding class, protein function and secondary structure

Daniele Santoni
DOI: https://doi.org/10.1016/j.jtbi.2021.110806
2021-10-07
Abstract:The genetic code consists in a set of rules used by living organisms to translate genomic information, contained in genes, into proteins; every amino acid is coded by a set of nucleotide triplets or codons. We refer to codon choice as the choice of a given codon, among the synonymous available ones, to code a given amino acid occurrence. The aim of this work is to shed light on the pivotal role that codon choice plays in regulating the timing of translation process, through patterns of low and high translation efficiency codons. A translation efficiency value, namely codon score, was associated to each codon through a formula based on the number of tRNAs gene copies able to translate the given codon. By using codon scores, those k-mers of the proteome of Saccharomyces cerevisiae, showing low and high average scores associated to the correspondent codons, were computed. The analysis of distribution of both low and high average score k-mers clearly showed that, in particular for higher k-mer size, they occur much more than expected, strongly suggesting a functional role. Moreover performed analysis highlighted that significant k-mers preferentially occur in some protein folding classes, such as those containing alpha helices, and in some functional classes mainly involved in transcription process while codon choice seems to have a very low impact in proteins associated to energy production and metabolism. The relationship between secondary structures and significant k-mers was investigated, revealing that low score k-mers tend to preferentially occur in coil or close to coil regions and almost never in beta sheets, while high score k-mers preferentially occur in alpha helices, avoiding beta sheets, and close to coil regions for high k-mer sizes. Finally the analysis of distribution of significant codon patterns along the proteins highlighted a relevant enrichment of low average score k-mers at the 5' end of protein-coding sequences in the region from 5th to 25th amino acid.
What problem does this paper attempt to address?