Abstract:In the past decade, hundreds of long noncoding RNAs (lncRNAs) have been identified as significant players in diverse types of cancer; however, the functions and mechanisms of most lncRNAs in cancer remain unclear. Several computational methods have been developed to detect associations between cancer and lncRNAs, yet those approaches have limitations in both sensitivity and specificity. With the goal of improving the prediction accuracy for associations of lncRNA with cancer, we upgraded our previously developed cancer-related lncRNA classifier, CRlncRC, to generate CRlncRC2. CRlncRC2 is an eXtreme Gradient Boosting (XGBoost) machine learning framework, including Synthetic Minority Over-sampling Technique (SMOTE)-based over-sampling, along with Laplacian Score-based feature selection. Ten-fold cross-validation showed that the AUC value of CRlncRC2 for identification of cancer-related lncRNAs is much higher than previously reported by CRlncRC and others. Compared with CRlncRC, the number of features used by CRlncRC2 dropped from 85 to 51. Finally, we identified 439 cancer-related lncRNA candidates using CRlncRC2. To evaluate the accuracy of the predictions, we first consulted the cancer-related long non-coding RNA database Lnc2Cancer v2.0 and relevant literature for supporting information, then conducted statistical analysis of somatic mutations, distance from cancer genes, and differential expression in tumor tissues, using various data sets. The results showed that our approach was highly reliable for identifying cancer-related lncRNA candidates. Notably, the highest ranked candidate, lncRNA AC074117.1, has not been reported previously; however, integrated multi-omics analyses demonstrate that it is the target of multiple cancer-related miRNAs and interacts with adjacent protein-coding genes, suggesting that it may act as a cancer-related competing endogenous RNA, which warrants further investigation. In conclusion, CRlncRC2 is an effective and accurate method for identification of cancer-related lncRNAs, and has potential to contribute to the functional annotation of lncRNAs and guide cancer therapy.

Evaluation of deep-learning-based lncRNA identification tools

LncADeep: an Ab Initio Lncrna Identification and Functional Annotation Tool Based on Deep Learning

LncLSTA: A Versatile Predictor Unveiling Subcellular Localization of Lncrnas Through Long-Short Term Attention

A comprehensive survey on deep learning-based identification and predicting the interaction mechanism of long non-coding RNAs

1 Supplementary materials for LncADeep : An ab initio lncRNA identification and functional annotation tool based on deep learning

DeepLncPro: an interpretable convolutional neural network model for identifying long non-coding RNA promoters

PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features

LncDLSM: Identification of Long Non-Coding RNAs With Deep Learning-Based Sequence Model

EV1ncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning

EVlncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning

COME: a robust coding potential calculation tool for lncRNA identification and characterization based on multiple features

Deep Learning to Analyze RNA-Seq Gene Expression Data

Identification of Cancer-Related Long Non-Coding RNAs Using XGBoost With High Accuracy

A method for evaluating of RNA's coding potential using the interaction effects of open reading frames and high-energy scalograms

In-depth characterization and identification of translatable lncRNAs

EnANNDeep: An Ensemble-based lncRNA–protein Interaction Prediction Framework with Adaptive k-Nearest Neighbor Classifier and Deep Models

PLEKv2: predicting lncRNAs and mRNAs based on intrinsic sequence features and the coding-net model

Talar peroneal syndrome. The common condition uncommonly diagnosed.

A 2-20 GHz high-gain monolithic HEMT distributed amplifier

Prelnc2: A prediction tool for lncRNAs with enhanced multi-level features of RNAs

IIMLP: integrated information-entropy-based method for LncRNA prediction