Heterogeneous Causal Metapath Graph Neural Network for Gene-Microbe-Disease Association Prediction

Kexin Zhang,Feng Huang,Luotao Liu,Zhankun Xiong,Hongyu Zhang,Yuan Quan,Wen Zhang
2024-06-27
Abstract:The recent focus on microbes in human medicine highlights their potential role in the genetic framework of diseases. To decode the complex interactions among genes, microbes, and diseases, computational predictions of gene-microbe-disease (GMD) associations are crucial. Existing methods primarily address gene-disease and microbe-disease associations, but the more intricate triple-wise GMD associations remain less explored. In this paper, we propose a Heterogeneous Causal Metapath Graph Neural Network (HCMGNN) to predict GMD associations. HCMGNN constructs a heterogeneous graph linking genes, microbes, and diseases through their pairwise associations, and utilizes six predefined causal metapaths to extract directed causal subgraphs, which facilitate the multi-view analysis of causal relations among three entity types. Within each subgraph, we employ a causal semantic sharing message passing network for node representation learning, coupled with an attentive fusion method to integrate these representations for predicting GMD associations. Our extensive experiments show that HCMGNN effectively predicts GMD associations and addresses association sparsity issue by enhancing the graph's semantics and structure.
Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of predicting the ternary associations among genes, microbes, and diseases (GMD). Existing methods mainly focus on the binary associations between gene-disease and microbe-disease, but there is less research on the more complex ternary GMD associations. This paper proposes a Heterogeneous Causal Meta-path Graph Neural Network (HCMGNN) to predict these ternary associations. Specifically, HCMGNN achieves this goal through the following steps: 1. **Constructing a Heterogeneous Graph**: HCMGNN first constructs a heterogeneous graph connecting genes, microbes, and diseases. The nodes in the graph represent genes, microbes, and diseases, and the edges represent their binary associations. 2. **Generating Causal Subgraphs**: Using six predefined causal meta-paths (such as G-M-D, G-D-M, etc.), directed causal subgraphs are extracted from the heterogeneous graph. These subgraphs facilitate multi-perspective analysis of the causal relationships among the three types of entities (genes, microbes, and diseases). 3. **Message Passing within Subgraphs**: Within each subgraph, a causal semantic shared message passing network is used to learn node representations, and these representations are integrated using an attention fusion method to predict GMD associations. 4. **Model Training**: By training the model, the loss function is optimized to improve prediction performance. Through these steps, HCMGNN can effectively predict GMD associations, addressing the issue of association sparsity and enhancing the semantic and structural information of the graph. Experimental results show that HCMGNN outperforms other baseline methods on various evaluation metrics.