Linguistic mechanism of the evolution of amino acid frequencies and genomic GC content

Dirson Jian Li
DOI: https://doi.org/10.48550/arXiv.q-bio/0612010
IF: 4.31
2006-12-06
Genomics
Abstract:Much information is stored in amino acid composition of protein and base composition of DNA. We simulated the evolution of amino acid frequencies and genomic GC content by a linguistic model. It is showed that the evolution of genetic code determines the evolution of amino acid frequencies and genomic GC content. We explained the relationships among amino acid frequencies, genomic GC content and protein length distribution in a unified theoretical framework. Especially, the simulations of the evolution of amino acid frequencies and the codon position GC content agree dramatically with the results based on the data of all known genomes so far. Furthermore, we found that the space of average protein length in proteome and ratio of amino acid frequencies is useful to describe the phylogeny and evolution. Amazingly, the dots of all the species in this space form an evolutionary flow. We believe that the amino acid gain and loss is motivated by the established pattern of the variation of amino acid frequencies. The linguistic mechanism is helpful to unveil the origin of the genetic code.
What problem does this paper attempt to address?