In-depth characterization and identification of translatable lncRNAs
Meng Zhang,Jian Zhao,Jing Wu,Yulan Wang,Minhui Zhuang,Lingxiao Zou,Renlong Mao,Bin Jiang,Jingjing Liu,Xiaofeng Song
DOI: https://doi.org/10.1016/j.compbiomed.2023.107243
IF: 7.7
2023-07-15
Computers in Biology and Medicine
Abstract:Long non-coding RNAs (LncRNAs) are non-protein coding transcripts more than 200 nucleotides in length. Deep sequencing technologies have unveiled lncRNAs can harbor translatable short open reading frames (sORFs). Yet the regulatory mechanisms governing lncRNA translation events remain poorly understood. Here, we exhaustively detected the sequence, functional element, and structure features relevant to lncRNA translation in human. Extensive identification and analysis reveal that translatable lncRNAs contain richer protein-coding related sequence features, cap-dependent and cap-independent translation initiation mechanisms, and more stable secondary structures , as compared to untranslatable lncRNAs. These findings strongly support lncRNAs serve as a repository for the production of new small peptides. Based on the feature fusion affecting translation and the extreme gradient boosting (XGBoost) algorithm, we developed the first computational tool that dedicated for predicting translatable lncRNAs, named TransLncPred. Benchmark experimental results show that our method outperforms several state-of-the-art RNA coding potential prediction tools on the same training and testing datasets. The 100-time 10-fold cross-validation tests also demonstrate that regulatory element-derived features, especially N7-methylguanosine (m7G) and internal ribosome entry site (IRES), contribute to the improvement in predictive performance.
engineering, biomedical,computer science, interdisciplinary applications,mathematical & computational biology,biology