Splice Site Prediction Based on Characteristic of Sequential Motifs and C4.5 Algorithm

Hequan Sun,Qinke Peng,Quanwei Zhang,Dan Mou
DOI: https://doi.org/10.1109/fskd.2008.331
2008-01-01
Abstract:Through statistic analysis on the donor site sequences in the dataset of HS3D, the rules that the bases appear in the adjacent sites around the splice sites are used for constructing motifs, which are then utilized as the attributes of the DNA sequences. And by setting the value of each attribute the literal sequences are transformed into quasi numeric vectors, based on which a decision tree (C4.5 algorithm) model is built to predict splice sites. The experimental results indicate that compared with the improved Maisheng Yinpsilas motif-scoring model, the proposed method has diminished the influence on the prediction generated by the abnormal data effectively and shows that the new encoding method in virtue of motifs is practicable and effectual.
What problem does this paper attempt to address?