A novel genome-wide polyadenylation sites recognition system based on condition random field.

Jiuqiang Han,Shanxin Zhang,Jun Liu,Ruiling Liu
DOI: https://doi.org/10.1109/EMBC.2014.6944687
2014-01-01
Abstract:Polyadenylation including the cleavage of pre-mRNA and addition of a stretch of adenosines to the 3'-end is an essential step of pre-mRNA processing in eukayotes. The known regulatory role of polyadenylation in mRNA localization, stability, and translation and the emerging link between poly(A) and disease states underline the necessary to fully characterize polyadenylation sites. Several artificial intelligence methods have been proposed for poly(A) sites recognition. However, these methods are suitable to small subsets of genome sequences. It is necessary to propose a method for genome-wide recognition of poly(A) sites. Recent efforts have found a lot of poly(A) related factors on DNA level. Here, we proposed a novel genome-wide poly(A) recognition method based on the Condition Random Field (CRF) by integrating multiple features. Compared with the polya_svm (the most accurate program for prediction of poly(A) sites till date), our method had a higher performance with the area under ROC curve(0.8621 versus 0.6796). The result suggests that our method is an effective method in genome wide poly(A) sites recognition.
What problem does this paper attempt to address?