Minimal model for genome evolution and growth

L.C. Hsieh,L.F. Luo,F.M. Ji,H.C. Lee
DOI: https://doi.org/10.1103/PhysRevLett.90.018101
2002-06-11
Abstract:Textual analysis of typical microbial genomes reveals that they have the statistical characteristics of a DNA sequence of a much shorter length. This peculiar property supports an evolutionary model in which a genome evolves by random mutation but primarily grows by random segmental self-copying. That genomes grew mostly by self-copying is consistent with the observation that repeat sequences in all genomes are widespread and intragenomic and intergenomic homologous genes are preponderance across all life forms. The model predicates the coexistence of the two competing modes of evolution: the gradual changes of classical Darwinism and the stochastic spurts envisioned in ``punctuated equilibrium''.
Biological Physics,Genomics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to understand the evolution and growth mechanisms of microbial genomes, especially to explain why modern microbial genomes exhibit characteristics similar to shorter DNA sequences in statistical properties. Specifically, by analyzing the text features of typical microbial genomes, the authors found that they have k - mer (oligonucleotides with length k) frequency distributions significantly different from simple random sequences. ### Main problems 1. **Why is the k - mer distribution of microbial genomes different from that of simple random sequences?** - The 6 - mer distribution of modern microbial genomes shows an abnormally high standard deviation, and there are a large number of high - frequency and low - frequency 6 - mers, which is inconsistent with the Poisson distribution of simple random sequences. 2. **What is the origin of these statistical properties?** - The authors proposed a hypothesis that genomes grow mainly through segmental self - copying rather than simple random mutations. This self - copying mechanism may cause the genome to retain the statistical features of its early shorter ancestors. ### Solutions The authors proposed a minimal model, which combines two events driving genome changes: Single Base Replacement (SBR) and Random Duplication (RD). By simulating these two events, the authors showed how to generate genomes with k - mer distributions similar to those of real microbial genomes. ### Key points of the model - **Initial state**: The genome is a simple random sequence with a length of \( L_0 \). - **Evolution process**: The genome grows to more than 1 Mb through SBR and RD events. - **Parameters**: - \( L_0 \): Initial genome length. - \( \eta \): Probability ratio of SBR and RD events. - \( \sigma \): Characteristic length of the duplicated segment. ### Results By adjusting the parameters \( \eta \) and \( \sigma \), the genomes generated by the model can well reproduce the 6 - mer distribution of real microbial genomes, especially those 6 - mers with abnormally high and low frequencies. In addition, the model also predicts the co - existence of two evolutionary patterns: the gradual change in classical Darwinism and the random mutation in the "punctuated equilibrium" theory. ### Conclusions The authors believe that the main growth mode of the genome is through segmental self - copying rather than simple random mutations. This mechanism not only explains the special statistical properties of microbial genomes but also implies that the genome may retain the features of its early shorter ancestors during the evolution process. This finding helps us better understand the evolutionary history of the genome and its potential biological significance.