Matthew Morris,David J. Tena Cucala,Bernardo Cuenca Grau,Ian Horrocks
Abstract:Graph neural networks (GNNs) are frequently used to predict missing facts in knowledge graphs (KGs). Motivated by the lack of explainability for the outputs of these models, recent work has aimed to explain their predictions using Datalog, a widely used logic-based formalism. However, such work has been restricted to certain subclasses of GNNs. In this paper, we consider one of the most popular GNN architectures for KGs, R-GCN, and we provide two methods to extract rules that explain its predictions and are sound, in the sense that each fact derived by the rules is also predicted by the GNN, for any input dataset. Furthermore, we provide a method that can verify that certain classes of Datalog rules are not sound for the R-GCN. In our experiments, we train R-GCNs on KG completion benchmarks, and we are able to verify that no Datalog rule is sound for these models, even though the models often obtain high to near-perfect accuracy. This raises some concerns about the ability of R-GCN models to generalise and about the explainability of their predictions. We further provide two variations to the training paradigm of R-GCN that encourage it to learn sound rules and find a trade-off between model accuracy and the number of learned sound rules.
What problem does this paper attempt to address?
### Problems Addressed by the Paper
This paper aims to address the interpretability issue of Relational Graph Convolutional Networks (R-GCN) in the task of Knowledge Graph (KG) completion. Specifically, the authors focus on how to extract Datalog rules from R-GCN that can explain its prediction results and ensure that these rules are reliable (i.e., every fact derived from the rules is also a result predicted by the R-GCN). However, existing methods cannot guarantee the reliability of the extracted rules in certain cases, raising concerns about the generalization ability and interpretability of the R-GCN model.
### Main Contributions
1. **Identifying Monotonic Behavior Output Channels**:
- The authors propose a method to identify a subset of R-GCN output channels that exhibit monotonic behavior, which can be used to extract reliable Datalog rules.
- By analyzing the dependencies between model parameters, the authors provide a data-independent method to identify these channels.
2. **Verifying Unreliable Output Channels**:
- The authors also provide a method to identify unbounded output channels that cannot extract reliable Datalog rules, which inherently exhibit non-monotonic behavior.
3. **Experimental Validation**:
- The authors conducted experiments on benchmark datasets, showing that even in ideal conditions, all channels of a well-trained R-GCN are unbounded, meaning no reliable Datalog rules exist.
- They proposed two methods to adjust the training process. By gradually clamping weights close to zero during training, they observed that as the clamping threshold increases, more output channels exhibit monotonic behavior, allowing for the extraction of more reliable rules, but the model's accuracy decreases.
### Method Overview
1. **Safe Channels**:
- Defined the concept of "safe channels," where the channel values are only influenced by non-negative weight matrices for any dataset.
- Proved that safe channels only increase or remain unchanged when new facts are added.
2. **Stable and Incremental Channels**:
- Further classified the behavior of channels, defining "stable," "incremental," and "decremental" channels.
- Proved that the behavior of stable and incremental channels meets expectations and can be used for rule extraction.
3. **Unbounded Channels**:
- Identified unbounded channels that cannot extract reliable Datalog rules.
- Provided a data-independent method to verify the non-monotonic behavior of these channels.
### Experimental Results
- Experiments on multiple benchmark datasets show that even in ideal conditions, all channels of R-GCN are unbounded, meaning no reliable Datalog rules exist.
- By adjusting the training process, the number of channels exhibiting monotonic behavior can be increased, but the model's accuracy decreases, indicating a trade-off between performance and rule extraction.
### Conclusion
This paper provides a new method to improve the interpretability of the model by identifying and verifying monotonic behavior channels in R-GCN. However, experimental results also show that R-GCN may not fully meet the requirements for reliable rules in certain cases, providing directions for further research.