LightGBM-LncLoc: A LightGBM-Based Computational Predictor for Recognizing Long Non-Coding RNA Subcellular Localization

Jianyi Lyu,Peijie Zheng,Yue Qi,Guohua Huang
DOI: https://doi.org/10.3390/math11030602
IF: 2.4
2023-01-26
Mathematics
Abstract:Long non-coding RNAs (lncRNA) are a class of RNA transcripts with more than 200 nucleotide residues. LncRNAs play versatile roles in cellular processes and are thus becoming a hot topic in the field of biomedicine. The function of lncRNAs was discovered to be closely associated with subcellular localization. Although many methods have been developed to identify the subcellular localization of lncRNAs, there still is much room for improvement. Herein, we present a lightGBM-based computational predictor for recognizing lncRNA subcellular localization, which is called LightGBM-LncLoc. LightGBM-LncLoc uses reverse complement k-mer and position-specific trinucleotide propensity based on the single strand for multi-class sequences to encode LncRNAs and employs LightGBM as the learning algorithm. LightGBM-LncLoc reaches state-of-the-art performance by five-fold cross-validation and independent test over the datasets of five categories of lncRNA subcellular localization. We also implemented LightGBM-LncLoc as a user-friendly web server.
mathematics
What problem does this paper attempt to address?