Classification of CpG Islands in the Human Genome Based on the Interval Distance Distribution of Adjacent CG Sites

Changle Qi,Xiaoming Wu,Lili Liu,Jianqiang Du,Bo Wang
DOI: https://doi.org/10.1109/CSIE.2009.822
2009-01-01
Abstract:There have been many studies analyzing relations between CpG islands and gene functions. Most results showed that promoters of many housekeeping genes contain CpG islands, however, the relation between gene functions and CG dinucleotides positions in CpG islands was less considered. In this study, we try to classify CpG islands according to interval distance distribution of adjacent CG sites and find some functional correlations. First the human genome sequences were downloaded from the EMBL Nucleotide Sequence Database. Then a dataset was constructed, each record of which is an interval distance distribution of adjacent CG sites of a CpG island. Finally an algorithm was designed, which can calculate approximately minimal difference of any two records. Based on the algorithm, we obtained many classes using the hierarchical clustering method, each of which contains some similar CpG islands, and some of their common features were studied.
What problem does this paper attempt to address?