OP-Triplet-ELM: Identification of real and pseudo microRNA precursors using extreme learning machine with optimal features.

Cong Pian,Jin Zhang,Yuan-Yuan Chen,Zhi Chen,Qin Li,Qiang Li,Liang-Yun Zhang
DOI: https://doi.org/10.1142/S0219720016500062
2016-01-01
Journal of Bioinformatics and Computational Biology
Abstract:MicroRNAs (miRNAs) are a set of short (21-24 nt) non-coding RNAs that play significant regulatory roles in the cells. Triplet-SVM-classifier and MiPred (random forest, RF) can identify the real pre-miRNAs from other hairpin sequences with similar stem-loop (pseudo pre-miRNAs). However, the 32-dimensional local contiguous structure-sequence can induce a great information redundancy. Therefore, it is essential to develop a method to reduce the dimension of feature space. In this paper, we propose optimal features of local contiguous structure-sequences (OP-Triplet). These features can avoid the information redundancy effectively and decrease the dimension of the feature vector from 32 to 8. Meanwhile, a hybrid feature can be formed by combining minimum free energy (MFE) and structural diversity. We also introduce a neural network algorithm called extreme learning machine (ELM). The results show that the specificity (S-p) and sensitivity (S-n) of our method are 92.4% and 91.0%, respectively. Compared with Triplet-SVM-classifier, the total accuracy (ACC) of our ELM method increases by 5%. Compared with MiPred (RF) and miRANN, the total accuracy (ACC) of our ELM method increases nearly by 2%. What is more, our method commendably reduces the dimension of the feature space and the training time.
What problem does this paper attempt to address?