A method for evaluating of RNA's coding potential using the interaction effects of open reading frames and high-energy scalograms
Hua Gao,Peng Gao,Ning Ye
DOI: https://doi.org/10.1016/j.compbiomed.2023.107752
IF: 7.7
2023-11-27
Computers in Biology and Medicine
Abstract:The identification and function determination of long non-coding RNAs (lncRNAs) can help to better understand the transcriptional regulation in both normal development and disease pathology, thereby demanding methods to distinguish them from protein-coding (pcRNAs) after obtaining sequencing data. Many algorithms based on the statistical, structural, physical, and chemical properties of the sequences have been developed for evaluating the coding potential of RNA to distinguish them. In order to design common features that do not rely on hyperparameter tuning and optimization and are evaluated accurately, we designed a series of features from the effects of open reading frames (ORFs) on their mutual interactions and with the electrical intensity of sequence sites to further improve the screening accuracy. Finally, the single model constructed from our designed features meets the strong classifier criteria, where the accuracy is between 82% and 89%, and the prediction accuracy of the model constructed after combining the auxiliary features equal to or exceed some best classification tools. Moreover, our method does not require special hyper-parameter tuning operations and is species insensitive compared to other methods, which means this method can be easily applied to a wide range of species. Also, we find some correlations between the features, which provides some reference for follow-up studies.
engineering, biomedical,computer science, interdisciplinary applications,mathematical & computational biology,biology
What problem does this paper attempt to address?