Systematic Analysis Revealed Better Performance of Random Forest Algorithm Coupled with Complex Network Features in Predicting Microrna Precursors
Xiaojing Tang,Jiamin Xiao,Yizhou Li,Zhining Wen,Zheng Fang,Menglong Li
DOI: https://doi.org/10.1016/j.chemolab.2012.05.001
IF: 4.175
2012-01-01
Chemometrics and Intelligent Laboratory Systems
Abstract:The improvement of computational methods greatly benefits the investigation of miRNAs. Our study validates the features in miRNA identification with an independent dataset, and provides researchers with common practices of the development of predictive models. A total number of 84 representative features, which occurred in researches of miRNAs classification, have been extracted and divided into four feature sets, i.e. complex network feature set (NET), structural feature set (STRUC), thermodynamic feature set (THERMO), and hybrid feature set (TOTAL). Systematic analysis is carried out on network, structural, thermodynamic and hybrid features. The dominant features are discriminated from uninformative features in both single and hybrid sets, on the basis of permutation importance strategy. Random forest models are constructed using only informative network, structural, thermodynamic and hybrid variables, resulting in area under the receiver operating curve (AUC) values of 0.9611, 0.9563, 0.9351, and 0.9469, respectively, based on validated datasets. The result suggests that the best performance could be got by using features derived from complex network. These results would be invaluable in understanding biological mechanism and function of miRNAs. All the data and scripts used in this article are freely available for download at http://cic.scu.edu.cn/bioinformatics/Extended_miRNA.zip.