Long-range correlations in DNA sequences using 2D DNA walk based on pairs of sequential nucleotides
Linxi Zhang,Zhouting Jiang
DOI: https://doi.org/10.1016/j.chaos.2004.03.012
2004-01-01
Abstract:We study the long-range correlations of DNA sequences using two-dimensional (2D) DNA walk model by considering pairs of sequential nucleotides. In this model, the effects of second-order correlation of DNA sequences on long-range correlations are considered. Some linear correlations are obtained in the double logarithmic plots of the mean square distance 〈R2(l)〉 and fluctuation F(l) versus nucleotide distance l along the DNA chain. It is found that 〈R2(l)〉 and F(l) may be expressed as〈R2(l)〉∼lγF(l)∼lHThe power spectrums of several sequential nucleotide pairs are also studied, and these curves are flat and hardly changed when the frequency f<10−1. Comparing with the other 12 curves of different sequential nucleotide pairs (i.e. AT, AC, AG, CA, CG, CT, GA, GC, GT, TA, TC, and TG), we find that GG, CC, TT and AA decrease obviously in the region of high frequency (f>0.15 bp−1). There is a notable peak value, which occurs at a frequency of 0.333 for coding DNA sequences, while the same peak value is not obtained for non-coding DNA sequences. The autocorrelation function C(l) is also calculated. Each curve of the double logarithmic plot has an almost linear correlation in the low value l region, especially in Escherichia coli genomic DNA. For non-coding DNA sequence, the curves have no-linear tails when l>2.0×104.