Abstract:Graph Neural Networks (GNNs) have emerged as powerful representation learning tools for capturing complex dependencies within diverse graph-structured data. Despite their success in a wide range of graph mining tasks, GNNs have raised serious concerns regarding their trustworthiness, including susceptibility to distribution shift, biases towards certain populations, and lack of explainability. Recently, integrating causal learning techniques into GNNs has sparked numerous ground-breaking studies since many GNN trustworthiness issues can be alleviated by capturing the underlying data causality rather than superficial correlations. In this survey, we comprehensively review recent research efforts on Causality-Inspired GNNs (CIGNNs). Specifically, we first employ causal tools to analyze the primary trustworthiness risks of existing GNNs, underscoring the necessity for GNNs to comprehend the causal mechanisms within graph data. Moreover, we introduce a taxonomy of CIGNNs based on the type of causal learning capability they are equipped with, i.e., causal reasoning and causal representation learning. Besides, we systematically introduce typical methods within each category and discuss how they mitigate trustworthiness risks. Finally, we summarize useful resources and discuss several future directions, hoping to shed light on new research opportunities in this emerging field. The representative papers, along with open-source data and codes, are available in <a class="link-external link-https" href="https://github.com/usail-hkust/Causality-Inspired-GNNs" rel="external noopener nofollow">this https URL</a>.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the trustworthiness of Graph Neural Networks (GNNs) in practical applications, which specifically includes the following aspects:
1. **Insufficient Out - Of - Distribution (OOD) Generalization Ability**:
- GNNs are unstable when the distribution of training data and test data is inconsistent. This instability is mainly because GNNs tend to capture spurious correlations between non - causal graph components and labels, and these correlations may change in different data distributions. For example, in citation networks, an author's institution may have a causal impact on the citation pattern and influence of a paper, leading to spurious correlations; in molecular graphs, selecting molecular graphs with the same skeleton may lead to spurious correlations.
2. **Unfairness**:
- GNNs may generate representations that are biased towards certain groups, resulting in unfair results for specific sample groups. Traditional correlation - based fairness concepts may not be able to fully solve the problem because they do not take into account the underlying causal mechanisms that lead to unfairness. For example, sensitive attributes (such as gender, race) may have spurious correlations with labels through various biases (such as data selection bias), or sensitive attributes may have a causal impact on labels, all of which will cause the node embeddings generated by GNNs to be affected by sensitive attributes, thus violating Graph Counterfactual Fairness (GCF).
3. **Poor Explainability**:
- The information propagation process of GNNs has a black - box nature, resulting in its poor explainability. This not only affects the reliability of the model but also hinders the ability of developers to diagnose and solve model performance problems. Understanding the causal mechanisms in graph data can improve the explainability of GNNs, making it more reliable in high - risk applications (such as fraud detection, criminal justice).
To address these problems, the paper proposes to integrate causal learning techniques into GNNs to construct Causality - Inspired GNNs (CIGNNs). Through this method, GNNs can better understand the causal relationships in graph data, thereby improving their performance in out - of - distribution generalization, fairness, and explainability.