CIDER: Counterfactual-Invariant Diffusion-based GNN Explainer for Causal Subgraph Inference

Qibin Zhang,Chengshang Lyu,Lingxi Chen,Qiqi Jin,Luonan Chen
2024-07-28
Abstract:Inferring causal links or subgraphs corresponding to a specific phenotype or label based solely on measured data is an important yet challenging task, which is also different from inferring causal nodes. While Graph Neural Network (GNN) Explainers have shown potential in subgraph identification, existing methods with GNN often offer associative rather than causal insights. This lack of transparency and explainability hinders our understanding of their results and also underlying mechanisms. To address this issue, we propose a novel method of causal link/subgraph inference, called CIDER: Counterfactual-Invariant Diffusion-based GNN ExplaineR, by implementing both counterfactual and diffusion implementations. In other words, it is a model-agnostic and task-agnostic framework for generating causal explanations based on a counterfactual-invariant and diffusion process, which provides not only causal subgraphs due to counterfactual implementation but reliable causal links due to the diffusion process. Specifically, CIDER is first formulated as an inference task that generatively provides the two distributions of one causal subgraph and another spurious subgraph. Then, to enhance the reliability, we further model the CIDER framework as a diffusion process. Thus, using the causal subgraph distribution, we can explicitly quantify the contribution of each subgraph to a phenotype/label in a counterfactual manner, representing each subgraph's causal strength. From a causality perspective, CIDER is an interventional causal method, different from traditional association studies or observational causal approaches, and can also reduce the effects of unobserved confounders. We evaluate CIDER on both synthetic and real-world datasets, which all demonstrate the superiority of CIDER over state-of-the-art methods.
Computational Engineering, Finance, and Science
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to infer the causal subgraph related to a specific phenotype or label based on only the measured data. This task is different from inferring causal nodes, and among the existing methods, most graph neural network (GNN) interpreters only provide insights into correlation rather than causality. This lack of transparency and interpretability hinders our understanding of the results and their underlying mechanisms. Specifically, this paper proposes solutions to the following problems: 1. **Limitations of existing GNN interpreters**: Existing GNN interpreters usually can only provide correlation explanations rather than causal explanations. This means that they can point out which subgraphs are related to a specific task, but cannot clearly determine whether these subgraphs actually affect the label or phenotype of the sample. 2. **Impact of unobserved confounders**: Traditional methods are difficult to deal with unobserved confounders, which may lead to incorrect causal inferences. 3. **Improving the reliability and accuracy of causal inferences**: A new method is needed to improve the reliability of causal inferences and be able to quantify the causal contribution of each subgraph to the label. To solve these problems, the paper proposes a new framework named CIDER (Counterfactual - Invariant Diffusion - based GNN Explainer for Causal Subgraph Inference). CIDER aims to directly infer the causal subgraph and its causal strength on the label from high - dimensional measurement data by combining counterfactual invariance and the diffusion process. Specifically, CIDER achieves the following goals: - **Generate the distributions of causal subgraphs and spurious subgraphs**: By dividing the entire graph into causal subgraphs and spurious subgraphs, CIDER can directly predict the distributions of these two subgraphs. - **Enhance the reasoning ability using the diffusion process**: By denoising and refining the spurious subgraphs in each step of the diffusion process, CIDER can gradually converge to a more reliable distribution of causal subgraphs during the training process. - **Reduce the impact of unobserved confounders**: As an interventional causal method, CIDER can reduce the impact of unobserved confounders on causal inferences. In summary, the main purpose of this paper is to develop a new method that can accurately identify causal subgraphs from data and explain their impact on labels or phenotypes, thereby overcoming the limitations of existing GNN interpreters and improving the reliability and accuracy of causal inferences.