Rules of 8-mer usage in genome sequences and its relation to genome evolution

Xiaoxian ZHU,Zhen YANG,Chengyan DUAN,LüWenping,Hong LI
DOI: https://doi.org/10.3969/j.issn.1672-5565.2016.04.01
2016-01-01
Abstract:The rules of k-mer non-random usage in genome sequences and its biological significance are important problems and its mechanism is still not clear .Based on seven genome sequences , the distributions of 8-mer frequency spectra were gotten .Results show that 8-mer spectra of dog and cow are trimodal and of zebra fish , medaka, nematode and yeast are unimodal .For chicken genome , the 8-mer spectrum is a medium between the two models.When the 8-mer set were classified into three subsets according to XY dinucleotide content , results show that only if in CG dinucleotide classification , the 0CG, 1CG and 2CG subsets form independent and unimodal distributions respectively .Compared with random sequences , it is found that 0CG motifs are the result of the random evolution , 1CG/2CG motifs are the result of the directed evolution and their frequencies are far low from the random frequencies .The rules of independent separation for the three CG subsets have species universality .Results indicate that the prime reasons about unimdals or multimodals of 8-mer spectra in different species are the distance differences of the three CG spectra .When seven genome sequences are normalized into 109 bp, results show that the spectra of 1CG and 2CG motifs are correlated significantly with genome evolution and of 0CG motifs has not obvious relation to genome evolution .We think that the three CG motifs have different biological functions .The rules of independent separation for the three CG subsets will provide a novel idea to research genome structures and evolutions and provide a method to reveal the functional elements in genome sequences .
What problem does this paper attempt to address?