PtLnc-BXE: Prediction of Plant Lncrnas Using a Bagging-XGBoost-ensemble Method with Multiple Features.

Guangyan Zhang,Ziru Liu,Jichen Dai,Zilan Yu,Shuai Liu,Wen Zhang
DOI: https://doi.org/10.48550/arxiv.1911.00185
2019-01-01
Abstract:Motivation: Since long non-coding RNAs (lncRNAs) have involved in a widerange of functions in cellular and developmental processes, an increasingnumber of methods have been proposed for distinguishing lncRNAs from codingRNAs. However, most of the existing methods are designed for lncRNAs in animalsystems, and only a few methods focus on the plant lncRNA identification.Different from lncRNAs in animal systems, plant lncRNAs have distinctcharacteristics. It is desirable to develop a computational method for accurateand robust identification of plant lncRNAs. Results: Herein, we present a plantlncRNA identification method ItLnc-BXE, which utilizes multiple features andthe ensemble learning strategy. First, a diversity of lncRNA features iscollected and filtered by feature selection to represent RNA transcripts. Then,several base learners are trained and further combined into a singlemeta-learner by ensemble learning, and thus an ItLnc-BXE model is constructed.ItLnc-BXE models are evaluated on datasets of six plant species, the resultsshow that ItLnc-BXE outperforms other state-of-the-art plant lncRNAidentification methods, achieving better and robust performances (AUC>95.91We also perform some experiments about cross-species lncRNA identification, andthe results indicate that dicots-based and monocots-based models can be used toaccurately identify lncRNAs in lower plant species, such as mosses and algae.Availability: source codes are available athttps://github.com/BioMedicalBigDataMiningLab/ItLnc-BXE. Contact:zhangwen@mail.hzau.edu.cn (or) zhangwen@whu.edu.cn Supplementary information:Supplementary data are available at Bioinformatics online.
What problem does this paper attempt to address?