Self-Supervised Contrastive Learning on Attribute and Topology Graphs for Predicting Relationships Among lncRNAs, miRNAs and Diseases

Lan Huang,Nan Sheng,Ling Gao,Lei Wang,Wenju Hou,Jie Hong,Yan Wang
DOI: https://doi.org/10.1109/JBHI.2024.3467101
2024-09-24
Abstract:Exploring potential association between long non-coding RNAs (lncRNAs), microRNAs (miRNAs) and diseases is an essential part of prevention, diagnosis and treatment of diseases. Since determining these relationships experimentally is resource-intensive and time-consuming, therefore computational methods have emerged as an attractive way to address this issue. However, existing computational approaches for inferring lncRNA-disease associations (LDA), miRNA-disease associations (MDA) and lncRNA-miRNA interactions (LMI) tend to focus on single task, neglecting the benefits of leveraging multiple biomolecular interactions and domain-specific knowledge for multi-task prediction. Furthermore, labeled data for LDA, MDA and LMI is scarce and costly in real-word applications, making it challenging for models to learn comprehensive node embedding patterns. Building on our previous work, this paper proposes a multi-task prediction model (called SSCLMD) that employs self-supervised contrastive learning on attribute and topology graphs to identify potential LDAs, MDAs and LMIs. Specifically, firstly, domain knowledge of lncRNAs, miRNAs and diseases as well as their interactions are exploited to construct attribute graph and topology graph, respectively. Then, the nodes are encoded in the attribute and topology spaces to extract the specific and common feature. Meanwhile, the attention mechanism is performed to adaptively fuse the embedding from different views. SSCLMD incorporates a contrastive self-supervised learning task as a regularize to guide the learning of node embeddings in both attribute and topology space without relying on labels. Severing as a regularize in multi-task learning paradigm, it to improves the model's generalization capabilities. Extensive experiments on 2 manually curated datasets demonstrate that SSCLMD significantly outperforms other baseline methods in LDA, MDA and LMI prediction tasks. Additionally, case studies on both new and old datasets further supported the ability of SSCLMD to uncover novel disease-related lncRNAs and miRNAs. The source codes and supplementary file of this work are publicly available on \url{https://github.com/sheng-n/SSCLMD}.
What problem does this paper attempt to address?