Emergence of the protein universe in organismal evolution

Konstantin B. Zeldovich,Boris E.Shakhnovich,Eugene I. Shakhnovich
DOI: https://doi.org/10.48550/arXiv.q-bio/0605044
2007-01-18
Abstract:In this work we propose a physical model of organismal evolution, where phenotype, organism life expectancy, is directly related to genotype i.e. the stability of its proteins which can be determined exactly in the model. Simulating the model on a computer, we consistently observe the Big Bang scenario whereby exponential population growth ensues as favorable sequence-structure combinations (precursors of stable proteins) are discovered. After that, random diversity of the structural space abruptly collapses into a small set of preferred structural motifs. We observe that protein folds remain stable and abundant in the population at time scales much greater than mutation or organism lifetime, and the distribution of the lifetimes of dominant folds in a population approximately follows a power law. The separation of evolutionary time scales between discovery of new folds and generation of new sequences gives rise to emergence of protein families and superfamilies whose sizes are power-law distributed, closely matching the same distributions for real proteins. The network of structural similarities of the universe of evolved proteins has the same scale-free like character as the actual protein domain universe graph (PDUG). Further, the model predicts that ancient protein domains represent a highly connected and clustered subset of all protein domains, in complete agreement with reality. Together, these results provide a microscopic first principles picture of how protein structures and gene families evolved in the course of evolution.
Populations and Evolution,Biomolecules
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to explain the formation mechanisms of protein structures and gene families during the evolutionary process at the molecular level. Specifically, the author proposed a physical model to simulate the evolution of organisms, in which the phenotype (i.e., the lifespan of the organism) is directly associated with the genotype (i.e., the stability of its proteins). Through computer simulations, the researchers observed a "big - bang" scenario, that is, when sequence - structure combinations favorable for survival (precursors of stable proteins) are discovered, the population will experience exponential growth. Subsequently, the random diversity in the structure space will suddenly collapse into a small number of preferred structural motifs. In addition, the study also found that protein folding remains stable and abundant on a time scale far exceeding that of mutation or the individual life cycle, and the life - cycle distribution of the dominant folding approximately follows a power - law distribution. The paper further explored how these findings lead to the emergence of protein families and superfamilies, the sizes of which also follow a power - law distribution, which is very consistent with the situation of actual proteins. The study also predicted that ancient protein domains represent a subset of all protein domains that are highly connected and aggregated, which is completely in line with the real situation. These results together provide a microscopic first - principle scenario, explaining how protein structures and gene families are formed during the evolutionary process.