Unifying Invariant and Variant Features for Graph Out-of-Distribution via Probability of Necessity and Sufficiency

Xuexin Chen,Ruichu Cai,Kaitao Zheng,Zhifan Jiang,Zhengting Huang,Zhifeng Hao,Zijian Li
2024-07-22
Abstract:Graph Out-of-Distribution (OOD), requiring that models trained on biased data generalize to the unseen test data, has considerable real-world applications. One of the most mainstream methods is to extract the invariant subgraph by aligning the original and augmented data with the help of environment augmentation. However, these solutions might lead to the loss or redundancy of semantic subgraphs and result in suboptimal generalization. To address this challenge, we propose exploiting Probability of Necessity and Sufficiency (PNS) to extract sufficient and necessary invariant substructures. Beyond that, we further leverage the domain variant subgraphs related to the labels to boost the generalization performance in an ensemble manner. Specifically, we first consider the data generation process for graph data. Under mild conditions, we show that the sufficient and necessary invariant subgraph can be extracted by minimizing an upper bound, built on the theoretical advance of the probability of necessity and sufficiency. To further bridge the theory and algorithm, we devise the model called Sufficiency and Necessity Inspired Graph Learning (SNIGL), which ensembles an invariant subgraph classifier on top of latent sufficient and necessary invariant subgraphs, and a domain variant subgraph classifier specific to the test domain for generalization enhancement. Experimental results demonstrate that our SNIGL model outperforms the state-of-the-art techniques on six public benchmarks, highlighting its effectiveness in real-world scenarios.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily addresses the Out-of-Distribution (OOD) generalization challenges encountered by Graph Neural Networks (GNNs) when processing graph data. Specifically, the research aims to solve the following issues: 1. **Extracting Optimal Invariant Subgraphs**: Existing methods extract invariant features through environmental augmentation to achieve domain generalization. However, these methods often struggle to find a balance, achieving an optimal trade-off between invariance alignment and prediction accuracy. This can lead to the loss or redundancy of semantic subgraphs, thereby affecting generalization performance. 2. **Utilizing Necessary and Sufficient Invariant Substructures**: The paper proposes a new framework that uses the Probability of Necessity and Sufficiency (PNS) to extract sufficient and necessary invariant substructures to overcome the aforementioned issues. This approach can better capture invariant features that are crucial for prediction. 3. **Incorporating Domain-Variant Features**: To further improve the model's performance on unseen data, the paper also considers domain-variant subgraphs related to the labels and integrates them with invariant subgraphs to enhance generalization capability. In summary, the core contribution of this paper is the proposal of a method named Sufficiency and Necessity Inspired Graph Learning (SNIGL). This method effectively extracts necessary and sufficient invariant subgraph features from training data and combines them with domain-specific variant features, thereby achieving significant improvements in domain generalization tasks for graph data. Experimental results show that SNIGL outperforms the current state-of-the-art techniques on 6 public benchmark datasets, demonstrating its effectiveness in practical applications.