An Identification of the Model Species Genomes

CHEN Cui-xia,LI Qian-zhong
DOI: https://doi.org/10.3969/j.issn.1000-1638.2005.04.012
2005-01-01
Abstract:Based on the conservation of nucleotides around the splice sites,the compositional feature and the existence of reading frame with 3-periodicity in coding sequence, the complete sequences of the eukaryotes genomes can be grouped into three kinds: introns, exons and intergenic DNA.The standard sources of diversity are respectively determined by the probability of 64 trimers on the whole sequence and 4 bases at 30 positions around the splice sites. The classification of a sequence can be determined by the least increment of diversity. The results show that the higher rates of correct prediction with the densities of 64 trimers and 120 bases have been obtained from standard sets and the test sets.The rates are better than that only with 64 trimers in terms of sensitivity (Sn) and specificity (Tn). The overall rates are as follows:C.elegans 88.37%,S.cerevisiae 90.72%,A.thaliana 91.08%,D.melanogaster 92.28%,E.coli 92.88%.On the analysis of the falsely predicted sequences,it can be seen that there are some similarities between the two kinds of sequences (the positive and the false).
What problem does this paper attempt to address?