Scaling behaviors of CG clusters in coding and noncoding DNA sequences

Linxi Zhang,Jin Chen
DOI: https://doi.org/10.1016/j.chaos.2004.07.013
2005-01-01
Abstract:In this paper the statistical properties of CG clusters in coding and non-coding DNA sequences are investigated through calculating the cluster-size distribution of CG clusters P(S) and the breadth of the distribution of the root-mean-square size of CG clusters σm in consecutive, non-overlapping blocks of m bases. There do exist some differences between coding and non-coding sequences. The cluster-size distribution of CG clusters P(S) for both coding and noncoding sequences follows an exponential decay of P(S)∝e−αS, and the value of α depends on the percentage of C–G content for coding sequences. It can fit into a linear line regularly but the case is contrary for noncoding sequences. We find that ξ(m)=σmm of CG clusters all obeys the good power-law decay of ξ(m)∝m−γ in both coding and non-coding sequences, and the value of γ is 0.949±0.014 and 0.826±0.011 for coding and noncoding sequences, respectively. Therefore, we can distinguish between coding and non-coding sequences on the basis of the value of γ. At the meantime, we also discuss the power-law of ξ(m)∝m−γ for random sequence, and find that the value of γ for random sequence is very close to 1.00. So we can know that the value of γ for coding sequences is more close to the random sequence, and obtain the conclusion that the behavior of coding sequence trends to random sequence more similarly. This investigation can provide some insights into DNA sequences.
What problem does this paper attempt to address?