Predicting Small RNAs in Bacteria Via Sequence Learning Ensemble Method.

Wen Zhang,Jingwen Shi,Guifeng Tang,Wenjian Wu,Xiang Yue,Dingfang Li
DOI: https://doi.org/10.1109/bibm.2017.8217729
2017-01-01
Abstract:Bacterial small non-coding RNAs (sRNAs) play important roles in various physiological processes, and predicting sRNAs is an important task. In this paper, we develop a computational method for the sRNA prediction by using sRNA sequence-derived features. We investigate a variety of sRNA sequence-derived features, and evaluate the usefulness of features for the sRNA prediction. Then, we develop the sequence learning ensemble method, which uses the linear weighted sum of outputs from the individual feature-based predictors to predict sRNAs, and the genetic algorithm is adopted to optimize the parameters in the ensemble system. In the computational experiments, we compile a balanced dataset and four imbalanced datasets, and evaluate our method on these datasets by using 5-fold cross validation. The sequence learning ensemble method can achieve AUC scores greater than 0.9, and outperforms existing state-of-the-art sRNA prediction methods. In conclusion, the proposed method has a great potential for sRNA prediction. The source codes, datasets and supplementary are available in http://www.bioinfotech.cn/BIBM2017/SLEM.
What problem does this paper attempt to address?