Abstract:MOTIVATION:The significance of long non-coding RNAs (lncRNAs) in many biological processes and diseases has gained intense interests over the past several years. However, computational identification of lncRNAs in a wide range of species remains challenging; it requires prior knowledge of well-established sequences and annotations or species-specific training data, but the reality is that only a limited number of species have high-quality sequences and annotations.RESULTS:Here we first characterize lncRNAs in contrast to protein-coding RNAs based on feature relationship and find that the feature relationship between open reading frame length and guanine-cytosine (GC) content presents universally substantial divergence in lncRNAs and protein-coding RNAs, as observed in a broad variety of species. Based on the feature relationship, accordingly, we further present LGC, a novel algorithm for identifying lncRNAs that is able to accurately distinguish lncRNAs from protein-coding RNAs in a cross-species manner without any prior knowledge. As validated on large-scale empirical datasets, comparative results show that LGC outperforms existing algorithms by achieving higher accuracy, well-balanced sensitivity and specificity, and is robustly effective (>90% accuracy) in discriminating lncRNAs from protein-coding RNAs across diverse species that range from plants to mammals. To our knowledge, this study, for the first time, differentially characterizes lncRNAs and protein-coding RNAs based on feature relationship, which is further applied in computational identification of lncRNAs. Taken together, our study represents a significant advance in characterization and identification of lncRNAs and LGC thus bears broad potential utility for computational analysis of lncRNAs in a wide range of species.AVAILABILITY AND IMPLEMENTATION:LGC web server is publicly available at http://bigd.big.ac.cn/lgc/calculator. The scripts and data can be downloaded at http://bigd.big.ac.cn/biocode/tools/BT000004.SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

Flnc: Machine Learning Improves the Identification of Novel Full-length Long Noncoding RNAs from RNA Sequencing Data Without Transcriptional Initiation Profiles

Flnc: Machine Learning Improves the Identification of Novel Long Noncoding RNAs from Stand-Alone RNA-Seq Data.

LncLSTA: A Versatile Predictor Unveiling Subcellular Localization of Lncrnas Through Long-Short Term Attention

LncFinder: an Integrated Platform for Long Non-Coding RNA Identification Utilizing Sequence Intrinsic Composition, Structural Information and Physicochemical Property.

FLYNC: A Machine Learning-Driven Framework for Discovering Long Non-Coding RNAs in

Lncrna-Mfdl: Identification of Human Long Non-Coding Rnas by Fusing Multiple Features and Using Deep Learning

Identification and Function Annotation of Long Intervening Noncoding RNAs

Prediction of Novel Long Non-Coding Rnas Based on Rna-Seq Data of Mouse Klf1 Knockout Study

Prediction Of Long Non-Coding Rnas Based On Deep Learning

Evaluation of deep-learning-based lncRNA identification tools

In-depth characterization and identification of translatable lncRNAs

Ncrfp: A Novel End-to-end Method for Non-Coding RNAs Family Prediction Based on Deep Learning.

Characterization and Identification of Long Non-Coding RNAs Based on Feature Relationship.

LncDLSM: Identification of Long Non-Coding RNAs With Deep Learning-Based Sequence Model

PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features

CFC-seq: identification of full-length capped RNAs unveil enhancer-derived transcription

Long Noncoding RNA Identification: Comparing Machine Learning Based Tools for Long Noncoding Transcripts Discrimination.

Lncident: A Tool for Rapid Identification of Long Noncoding RNAs Utilizing Sequence Intrinsic Composition and Open Reading Frame Information

Deep annotation of long noncoding RNAs by assembling RNA-seq and small RNA-seq data

FunlncModel: integrating multi-omic features from upstream and downstream regulatory networks into a machine learning framework to identify functional lncRNAs

Prelnc2: A prediction tool for lncRNAs with enhanced multi-level features of RNAs