Knowledge Graph Completion with Counterfactual Augmentation

Heng Chang,Jie Cai,Jia Li
DOI: https://doi.org/10.48550/arXiv.2302.13083
2023-02-25
Abstract:Graph Neural Networks (GNNs) have demonstrated great success in Knowledge Graph Completion (KGC) by modeling how entities and relations interact in recent years. However, most of them are designed to learn from the observed graph structure, which appears to have imbalanced relation distribution during the training stage. Motivated by the causal relationship among the entities on a knowledge graph, we explore this defect through a counterfactual question: "would the relation still exist if the neighborhood of entities became different from observation?". With a carefully designed instantiation of a causal model on the knowledge graph, we generate the counterfactual relations to answer the question by regarding the representations of entity pair given relation as context, structural information of relation-aware neighborhood as treatment, and validity of the composed triplet as the outcome. Furthermore, we incorporate the created counterfactual relations with the GNN-based framework on KGs to augment their learning of entity pair representations from both the observed and counterfactual relations. Experiments on benchmarks show that our proposed method outperforms existing methods on the task of KGC, achieving new state-of-the-art results. Moreover, we demonstrate that the proposed counterfactual relations-based augmentation also enhances the interpretability of the GNN-based framework through the path interpretations of predictions.
Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve two key problems in the Knowledge Graph Completion (KGC) task: 1. **The problem of unbalanced relation distribution**: In real - world knowledge graphs, the distribution of relation types is usually unbalanced. Most existing methods based on Graph Neural Networks (GNNs) mainly learn from the observed graph structure, which makes it difficult for the model to handle this imbalance during the training phase, thus affecting the generalization ability and prediction accuracy of the model. 2. **Insufficient causal relationship modeling**: Existing methods fail to fully consider the causal relationships between entities. Specifically, if the neighbor information of an entity changes, does the relationship between them still exist? This problem has not been fully explored and solved. ### Solutions To solve the above problems, the paper proposes a new method - **Knowledge Graph Completion with Counterfactual Augmentation (KGCF)**. The core idea of this method is to generate additional training data by introducing counterfactual reasoning, thereby enhancing the learning ability of the GNN model. The specific steps are as follows: 1. **Define counterfactual questions**: The paper proposes a counterfactual question: "If the neighbor information of an entity is different from the observed situation, does the relationship between them still exist?" This question aims to explore the causal effects of relationships between entities. 2. **Construct a causal model**: - **Context**: Representation of entity pairs \( \mathbf{z}_r(h, t) \). - **Treatment**: Relation - aware neighborhood structure information of entity pairs. - **Outcome**: Validity of triples \( A(h, r, t) \). 3. **Generate counterfactual relationships**: Estimate counterfactual relationships by matching the closest observed context. Specifically, for each entity pair \( (h, t) \), find the nearest neighbor with the opposite treatment variable \( T_{CF}(h, r, t) \) and use its outcome as the counterfactual relationship. 4. **Enhanced learning framework**: Combine the generated counterfactual relationships with the original observed data to form a new training set. In this way, the model can learn the representation of entity pairs from both observational and counterfactual perspectives, thereby improving prediction accuracy and generalization ability. ### Main contributions - Propose the first method to instantiate a causal model on a knowledge graph, enhancing the KGC task by answering counterfactual questions and considering relation types. - Design the KGCF framework, which uses counterfactual relationships to enhance GNN - based knowledge graph representation learning, with special attention to the problem of unbalanced relation distribution. - Experimental results show that KGCF significantly outperforms existing methods in the KGC task and enhances the interpretability of the model through path explanations. In conclusion, by introducing causal reasoning and counterfactual enhancement techniques, this paper effectively solves the problem of unbalanced relation distribution in the knowledge graph completion task and improves the prediction performance and generalization ability of the model.