A New method of LncRNA classification based on ensemble learning

Zongrui Dai
DOI: https://doi.org/10.1088/1742-6596/1994/1/012002
2021-08-01
Journal of Physics: Conference Series
Abstract:Abstract Long noncoding RNAs (lncRNAs), which have a length longer than 200bp (base pair), participate in various critical biological processes. Moreover, they have many similar features with another kind of RNA - coding RNA, such as long length of transcript and poly-A tail. Therefore, distinguish lncRNA and coding RNA can be one important task in bioinformatics. With the advanced and outstanding ability of machine learning, the computational method provides new insight into lncRNA classification. In this study, two feature selection methods (lasso and PCA) are applied to reduce dimension. 8 differentiated features are extracted, and lasso selection indicates better performance than the PCA method. To achieve an advanced performance of lncRNA classification, one novel ensemble learning based on primary learner and secondary learner is constructed. After comparing different kinds of models, ensemble learning achieves the most outstanding performance in AUC and accuracy within the test dataset (The median of Accuracy=0.950228, AUC=0.979664), which may shed light on the classification of lncRNA.
What problem does this paper attempt to address?