A Deep Learning Approach to LncRNA Subcellular Localization Using Inexact q-mers

Weijun Yi,Donald A. Adjeroh
DOI: https://doi.org/10.1109/bibm52615.2021.9669409
2021-12-09
Abstract:Long non-coding Ribonucleic Acids (lncRNAs) can be localized to different cellular components, such as the nucleus, exosome, cytoplasm, ribosome, etc. Their biological functions can be influenced by the region of the cell where they are located. Many of these lncRNAs are associated with different challenging diseases. Thus, it is crucial to study their subcellular localization. However, compared to the massive number of lncRNAs, only relatively few have annotations in terms of their subcellular localization. Conventional computational methods use q-mer profiles from lncRNA sequences and train machine learning models, such as support vector machines and logistic regression with the profiles. These methods focus on the exact q-mer. Given possible sequence mutations and other uncertainties in genomic sequences and their role in biological function, a consideration of these changes might improve our ability to model lncRNAs and their localization. We hypothesize that considering these changes may improve our ability to predict subcellular localization of lncRNAs. To test this hypothesis, we propose a deep learning model with inexact q-mers for the localization of lncRNAs in the cell. The proposed method can obtain a high overall accuracy of 94.7%, an average of 91.3% on a benchmark dataset, using 8-mers with mismatches. In comparison, the exact 8-mer result was 89.8%. The proposed approach outperformed existing state-of-art lncRNA localization predictors on two different datasets. Our results, therefore, support the hypothesis that deep learning models using inexact q-mers can improve the performance of computational lncRNA localization algorithms.
What problem does this paper attempt to address?