A STUDY ON THE SEQUENCES AND SPLICE SITES OF A.THALIANA AND C.ELEGANS GENES

陈翠霞,李前忠,林昊
DOI: https://doi.org/10.3321/j.issn:1000-6737.2004.02.007
2004-01-01
ACTA BIOPHYSICA SINICA
Abstract:The complete sequences of A.thaliana and C.elegans genome are divided into three kinds: exons, introns and intergenicDNA. The 64, 40 and 20 trimers' probabilities of the three kinds of sequences are respectively selected as parameters of the sources of diversity. The classes of these sequences are predicted by the increments of diversity the minimum of the three increments. The results shown that the overall prediction accuracies of A.thaliana's every chromosome are 82.19% and 87.95% for the standard-sets and test-sets; the overall prediction accuracies of C.elegans' every chromosome are 79.67% and 81.93% for the standard-sets and test-sets, respectively. In addition, the exons in A.thaliana and C.elegans genome are divided into three types. Based on the frequencies of 4 kinds of bases in regions near intron/exon boundary, initiation and termination site for translation, the diversity source is composed of 12 sequence parameters. The three kinds of exons are predicted by using of an algorithm based on the increment of diversity. The rates of correct prediction higher than 80% are obtained.
What problem does this paper attempt to address?