Statistical Properties of Nucleotides in Human Chromosomes 21 and 22

LX Zhang,TT Sun
DOI: https://doi.org/10.1016/j.chaos.2004.06.022
2005-01-01
Abstract:In this paper the statistical properties of nucleotides in human chromosomes 21 and 22 are investigated. The n-tuple Zipf analysis with n = 3, 4, 5, 6, and 7 is used in our investigation. It is found that the most common n-tuples are those which consist only of adenine (A) and thymine (T), and the rarest n-tuples are those in which GC or CG pattern appears twice. With the n-tuples become more and more frequent, the double GC or CG pattern becomes a single GC or CG pattern. The percentage of four nucleotides in the rarest ten and the most common ten n-tuples are also considered in human chromosomes 21 and 22, and different behaviors are found in the percentage of four nucleotides. Frequency of appearance of n-tuple f(r) as a function of rank r is also examined. We find the n-tuple Zipf plot shows a power-law behavior for r < 4n−1 and a rapid decrease for r > 4n−1. In order to explore the interior statistical properties of human chromosomes 21 and 22 in detail, we divide the chromosome sequence into some moving windows and we discuss the percentage of ξη (ξ, η = A, C, G, T) pair in those moving windows. In some particular regions, there are some obvious changes in the percentage of ξη pair, and there maybe exist functional differences. The normalized number of repeats N0(l) can be described by a power law: N0(l) ∼ l−μ. The distance distributions P0(S) between two nucleotides in human chromosomes 21 and 22 are also discussed. A two-order polynomial fit exists in those distance distributions: log P0(S) = a + bS + cS2, and it is quite different from the random sequence.
What problem does this paper attempt to address?