MedGraphNet: Leveraging Multi-Relational Graph Neural Networks and Text Knowledge for Biomedical Predictions

Oladimeji Macaulay,Michael Servilla,Kushal Virupakshappa,David Arredondo,Yue Hu,Luis Tafoya,Yanfu Zhang,Avinash Sahu
DOI: https://doi.org/10.1101/2024.09.24.614782
2024-09-25
Abstract:Genetic, molecular, and environmental factors influence diseases through complex interactions with genes, phenotypes, and drugs. Current methods often fail to integrate diverse multi-relational biological data meaningfully, limiting the discovery of novel risk genes and drugs. To address this, we present MedGraphNet, a multi-relational Graph Neural Network (GNN) model designed to infer relationships among drugs, genes, diseases, and phenotypes. MedGraphNet initializes nodes using informative embeddings from existing text knowledge, allowing for robust integration of various data types and improved generalizability. Our results demonstrate that MedGraphNet matches and often outperforms traditional single-relation approaches, particularly in scenarios with isolated or sparsely connected nodes. The model shows generalizability to external datasets, achieving high accuracy in identifying disease-gene associations and drug-phenotype relationships. Notably, MedGraphNet accurately inferred drug side effects without direct training on such data. Using Alzheimer's disease as a case study, MedGraphNet successfully identified relevant phenotypes, genes, and drugs, corroborated by existing literature. These findings demonstrate the potential of integrating multi-relational data with text knowledge to enhance biomedical predictions and drug repurposing for diseases.
Bioinformatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that current methods are insufficient in integrating diverse multi - relational biological data, which restricts the discovery of new risk genes and drugs. Specifically, existing methods often rely on a single type of relational nodes, such as disease - gene, disease - drug, etc., and are unable to effectively handle the small sample size in rare diseases and the situation of lacking a single risk factor. These problems impede the understanding of the interactions among genetic, molecular and environmental factors in complex diseases, and further affect the discovery of new drug targets and treatment regimens. To overcome these challenges, the paper proposes MedGraphNet, a method based on multi - relational graph neural network (GNN), aiming to infer the relationships among drugs, genes, diseases and phenotypes. MedGraphNet can initialize nodes by using existing textual knowledge, so it can more robustly integrate various data types and improve the generalization ability of the model. This method is especially suitable for predicting unknown risk genes and drug - disease associations, especially in rare diseases. The paper shows that the performance of MedGraphNet in multiple benchmark tests is better than that of traditional single - relational graph methods, especially when dealing with isolated or sparsely connected nodes. In addition, MedGraphNet can also accurately predict drug side effects without directly training relevant data, further demonstrating its potential in biomedical prediction.