GraphLncLoc: long non-coding RNA subcellular localization prediction using graph convolutional networks based on sequence to graph transformation

Min Li,Baoying Zhao,Rui Yin,Chengqian Lu,Fei Guo,Min Zeng
DOI: https://doi.org/10.1093/bib/bbac565
IF: 9.5
2022-12-23
Briefings in Bioinformatics
Abstract:The subcellular localization of long non-coding RNAs (lncRNAs) is crucial for understanding lncRNA functions. Most of existing lncRNA subcellular localization prediction methods use k -mer frequency features to encode lncRNA sequences. However, k -mer frequency features lose sequence order information and fail to capture sequence patterns and motifs of different lengths. In this paper, we proposed GraphLncLoc, a graph convolutional network-based deep learning model, for predicting lncRNA subcellular localization. Unlike previous studies encoding lncRNA sequences by using k -mer frequency features, GraphLncLoc transforms lncRNA sequences into de Bruijn graphs, which transforms the sequence classification problem into a graph classification problem. To extract the high-level features from the de Bruijn graph, GraphLncLoc employs graph convolutional networks to learn latent representations. Then, the high-level feature vectors derived from de Bruijn graph are fed into a fully connected layer to perform the prediction task. Extensive experiments show that GraphLncLoc achieves better performance than traditional machine learning models and existing predictors. In addition, our analyses show that transforming sequences into graphs has more distinguishable features and is more robust than k -mer frequency features. The case study shows that GraphLncLoc can uncover important motifs for nucleus subcellular localization. GraphLncLoc web server is available at http://csuligroup.com:8000/GraphLncLoc/.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?