Identification of Lncrna-Related Protein-Coding Genes Using Multi-Omics Data Based on Deep Learning and Matrix Completion

Meihong Gao,Xuequn Shang
DOI: https://doi.org/10.1109/bibm55620.2022.9995428
2022-01-01
Abstract:Long noncoding RNAs (lncRNAs) can regulate the expression of protein-coding genes (PCGs) to cause disease. Identifying lncRNA-PCG associations (LGAs) is beneficial in revealing the pathogenic mechanism of lncRNA. Nevertheless, it remains challenging due to the heterogeneity of lncRNA expression and the complexity of its regulatory patterns. Biological experiments have been designed to identify LGAs, but they cannot be used on a large scale due to time and financial constraints. Therefore, the design of computational methods becomes crucial for LGA research. Here, we propose a new computational model, DNNMC, to reveal potential LGAs based on deep neural networks and inductive matrix completion using association and multi-omics data. We first integrated LGA and multi-omics similarity to construct lncRNA and PCG similarity networks. Subsequently, deep graph convolutional networks were used for feature learning of lncRNAs and PCGs. These learned features and the known LGA matrix were finally used as input to the inductive matrix completion module for predicting potential LGAs. Experimental results on three datasets demonstrated that DNNMC outperformed other machine learning methods in predicting LGA relationships. Furthermore, multi-omics features were shown to improve the performance of LGA identification. In conclusion, we propose a new LGA prediction method, DNNMC, which can effectively complete the LGA prediction task and help to reveal the regulatory mechanism of lncRNAs in diseases.
What problem does this paper attempt to address?