KGEx: Explaining Knowledge Graph Embeddings via Subgraph Sampling and Knowledge Distillation

Vasileios Baltatzis,Luca Costabello
2023-10-02
Abstract:Despite being the go-to choice for link prediction on knowledge graphs, research on interpretability of knowledge graph embeddings (KGE) has been relatively unexplored. We present KGEx, a novel post-hoc method that explains individual link predictions by drawing inspiration from surrogate models research. Given a target triple to predict, KGEx trains surrogate KGE models that we use to identify important training triples. To gauge the impact of a training triple, we sample random portions of the target triple neighborhood and we train multiple surrogate KGE models on each of them. To ensure faithfulness, each surrogate is trained by distilling knowledge from the original KGE model. We then assess how well surrogates predict the target triple being explained, the intuition being that those leading to faithful predictions have been trained on impactful neighborhood samples. Under this assumption, we then harvest triples that appear frequently across impactful neighborhoods. We conduct extensive experiments on two publicly available datasets, to demonstrate that KGEx is capable of providing explanations faithful to the black-box model.
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of interpretability in Knowledge Graph Embeddings (KGE) models when predicting links. Although KGE models perform excellently in link prediction tasks, their black - box nature leads to poor interpretability of the models, which affects user trust, troubleshooting, and compliance. Therefore, this paper proposes KGEx, a posterior local explanation subsystem for KGE models, aiming to provide faithful and interpretable results for individual link predictions through sub - graph sampling and knowledge distillation techniques. Specifically, the main contributions of KGEx include: 1. **Proposing a new posterior explanation method**: Training proxy KGE models to identify training triples that have a significant impact on the target link prediction. 2. **Utilizing sub - graph sampling techniques**: Sampling sub - graphs from the original knowledge graph to limit the possible search space for explanations while maintaining fidelity to the target link prediction. 3. **Applying knowledge distillation**: By distilling knowledge from pre - trained KGE models (teacher models), ensuring that the training process of each proxy model (student model) can be faithful to the original model. 4. **Evaluating through the Monte Carlo process**: Repeatedly training multiple proxy models and ranking the training triples according to their contributions to the target link prediction, thereby generating the final list of explanations. Through these techniques, KGEx can improve the interpretability of KGE models without sacrificing prediction performance, making them more suitable for practical application scenarios.