Abstract:Accumulating evidence indicates that long noncoding RNAs (lncRNAs) play important roles in molecular and cellular biology. Although many algorithms have been developed to reveal their associations with complex diseases by using downstream targets, the upstream (epi)genetic regulatory information has not been sufficiently leveraged to predict the function of lncRNAs in various biological processes. Therefore, we present FunlncModel, a machine learning–based interpretable computational framework, which aims to screen out functional lncRNAs by integrating a large number of (epi)genetic features and functional genomic features from their upstream/downstream multi-omic regulatory networks. We adopted the random forest method to mine nearly 60 features in three categories from >2000 datasets across 11 data types, including transcription factors (TFs), histone modifications, typical enhancers, super-enhancers, methylation sites, and mRNAs. FunlncModel outperformed alternative methods for classification performance in human embryonic stem cell (hESC) (0.95 Area Under Curve (AUROC) and 0.97 Area Under the Precision-Recall Curve (AUPRC)). It could not only infer the most known lncRNAs that influence the states of stem cells, but also discover novel high-confidence functional lncRNAs. We extensively validated FunlncModel's efficacy by up to 27 cancer-related functional prediction tasks, which involved multiple cancer cell growth processes and cancer hallmarks. Meanwhile, we have also found that (epi)genetic regulatory features, such as TFs and histone modifications, serve as strong predictors for revealing the function of lncRNAs. Overall, FunlncModel is a strong and stable prediction model for identifying functional lncRNAs in specific cellular contexts. FunlncModel is available as a web server at https://bio.liclab.net/FunlncModel/.

Prelnc2: A prediction tool for lncRNAs with enhanced multi-level features of RNAs

PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features

LncLSTA: A Versatile Predictor Unveiling Subcellular Localization of Lncrnas Through Long-Short Term Attention

PLEKv2: predicting lncRNAs and mRNAs based on intrinsic sequence features and the coding-net model

A Hybrid Prediction Method for Plant lncRNA-Protein Interaction.

A method for evaluating of RNA's coding potential using the interaction effects of open reading frames and high-energy scalograms

Predicting Long non-coding RNAs through feature ensemble learning

lncLocPred: Predicting LncRNA Subcellular Localization Using Multiple Sequence Feature Information

EV1ncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning

IIMLP: integrated information-entropy-based method for LncRNA prediction

ABLNCPP: Attention Mechanism-Based Bidirectional Long Short-Term Memory for Noncoding RNA Coding Potential Prediction

EVlncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning

PreSubLncR: Predicting Subcellular Localization of Long Non-Coding RNA Based on Multi-Scale Attention Convolutional Network and Bidirectional Long Short-Term Memory Network

In-depth characterization and identification of translatable lncRNAs

MFPred: prediction of ncRNA families based on multi-feature fusion

Towards a better prediction of subcellular location of long non-coding RNA

LncDLSM: Identification of Long Non-Coding RNAs With Deep Learning-Based Sequence Model

FunlncModel: integrating multi-omic features from upstream and downstream regulatory networks into a machine learning framework to identify functional lncRNAs

MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding

PLIT: An alignment-free computational tool for identification of long non-coding RNAs in plant transcriptomic datasets