A simple stochastic model for the evolution of protein lengths

C. Destri,C. Miccio
DOI: https://doi.org/10.48550/arXiv.q-bio/0703054
2007-03-26
Populations and Evolution
Abstract:We analyse a simple discrete-time stochastic process for the theoretical modeling of the evolution of protein lengths. At every step of the process a new protein is produced as a modification of one of the proteins already existing and its length is assumed to be random variable which depends only on the length of the originating protein. Thus a Random Recursive Trees (RRT) is produced over the natural integers. If (quasi) scale invariance is assumed, the length distribution in a single history tends to a lognormal form with a specific signature of the deviations from exact gaussianity. Comparison with the very large SIMAP protein database shows good agreement.
What problem does this paper attempt to address?