Microbial genome as a fluctuating system: Distribution and correlation of coding sequence lengths

V. V. Morariu
DOI: https://doi.org/10.48550/arXiv.0805.4315
IF: 4.31
2008-05-28
Genomics
Abstract:The length of coding sequence series in microbial genomes were regarded as a fluctuating system and characterized by the methods of statistical physics. The distribution and the correlatin properties of 50 genomes including bacteria and several archaea were investigated. The distribution was investigated by rank-size analysis (Zipf's law. We found that coding sequence lengths series do not obey Zipf's law contrary to natural languages. The distribution was found to be more closely to an exponential distribution. The correlation appeared to be similar to natural languages. Segmentation analysis of the series showed to be short range memory systems.
What problem does this paper attempt to address?