A BERT-based model for the prediction of lncRNA subcellular localization in Homo sapiens

Zhao-Yue Zhang,Zheng Zhang,Xiucai Ye,Tetsuya Sakurai,Hao Lin
DOI: https://doi.org/10.1016/j.ijbiomac.2024.130659
IF: 8.2
2024-03-21
International Journal of Biological Macromolecules
Abstract:Understanding the subcellular localization of lncRNAs is crucial for comprehending their regulation activities. The conventional detection of lncRNA subcellular location usually uses in situ detection techniques, which are resource intensive. Some machine learning-based algorithms have been proposed for lncRNA subcellular location prediction in mammals. However, due to the low level of conservation of lncRNA sequence, the performance of cross-species models remains unsatisfactory. In this study, we curated a novel dataset containing subcellular location information of lncRNAs in Homo sapiens . Subsequently, based on the BERT pre-trained language algorithm, we developed a model for lncRNA subcellular location prediction. Our model achieved a micro-average area under the receiver operating characteristic (AUROC) of 0.791 on the training set and an AUROC of 0.700 on the testing nucleus set. Additionally, we conducted cross-species validation and motif discovery to further investigate underlying patterns. In summary, our study provides valuable guidance and computational analysis tools for exploring the mechanisms of lncRNA subcellular localization and the dynamic spatial changes of RNA in abnormal physiological states.
polymer science,biochemistry & molecular biology,chemistry, applied
What problem does this paper attempt to address?