Information Theory of Genomes

Dmitri V. Parkhomchuk
DOI: https://doi.org/10.48550/arXiv.q-bio/0612038
2007-01-13
Abstract:Relation of genome sizes to organisms complexity is still described rather equivocally. Neither the number of genes (G-value), nor the total amount of DNA (C-value) correlates consistently with phenotype complexity. Using information theory considerations we developed a model that allows a quantative estimate for the amount of functional information in a genomic sequence. This model easily answers the long-standing question of why GC content is increased in functional regions. The model allows consistent estimate of genome complexities, resolving the major discrepancies of G- and C-values. For related organisms with similarly complex phenotypes, this estimate provides biological insights into their niches complexities. This theoretical framework suggests that biological information can rapidly evolve on demand from environment, mainly in non-coding genomic sequence and explains the role of duplications in the evolution of biological information. Knowing the approximate amount of functionality in a genomic sequence is useful for many applications such as phylogenetics analyses, in-silico functional elements discovery or prioritising targets for genotyping and sequencing.
Genomics,Populations and Evolution
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that the relationship between genome size and organism complexity is unclear. Specifically, there is no consistent correlation between the number of genes (G - value) or the total amount of DNA (C - value) and phenotypic complexity. The author developed a model using information - theory methods, aiming to quantitatively estimate the amount of functional information in the genome sequence. This model not only explains why the GC content increases in functional regions, but also can provide a consistent assessment of genome complexity, resolving the major differences between G - value and C - value. In addition, the model also provides biological insights into the complexity of the ecological niches of related organisms, and proposes that biological information can rapidly respond to environmental demands and evolve mainly in non - coding genome sequences, as well as the role of replication in the evolution of biological information.