Abstract:Counterfactuals have been established as a popular explainability technique which leverages a set of minimal edits to alter the prediction of a classifier. When considering conceptual counterfactuals on images, the edits requested should correspond to salient concepts present in the input data. At the same time, conceptual distances are defined by knowledge graphs, ensuring the optimality of conceptual edits. In this work, we extend previous endeavors on graph edits as counterfactual explanations by conducting a comparative study which encompasses both supervised and unsupervised Graph Neural Network (GNN) approaches. To this end, we pose the following significant research question: should we represent input data as graphs, which is the optimal GNN approach in terms of performance and time efficiency to generate minimal and meaningful counterfactual explanations for black-box image classifiers?
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: when explaining the behavior of black - box image classifiers, how to generate minimal and meaningful conceptual counterfactual explanations through graph edits (Graph Edits). Specifically, the authors explored representing input data as graphs and investigated which graph neural network (GNN) method is optimal in terms of performance and time efficiency to generate these explanations.
### Problem Background
1. **Trust and Explainability**: With the emergence of large and complex neural models, ensuring trust between the model and humans has become a crucial issue. The explainability literature has proposed multiple methods to explore the behavior of neural models, including methods that require access to the internal workings of the model (white - box explanations) and methods that do not (black - box explanations).
2. **Importance of Conceptual Counterfactual Explanations**: Low - level features (such as pixel brightness, contrast) may not provide meaningful explanations for end - users, and semantic information is crucial for generating meaningful counterfactual explanations. Therefore, the authors focused on conceptual counterfactual explanations, that is, explaining the change in classification results by changing concepts in the image (such as objects, scene elements, etc.).
3. **Requirement for Black - Box Explanations**: Since some powerful models (such as ChatGPT, GPT - 4) can only be accessed through APIs, black - box explanations have become a more feasible and general solution. In this case, the authors studied how to generate effective counterfactual explanations without accessing the internal structure of the model.
### Research Questions
The authors proposed the following important research questions:
- In the case of representing input data as graphs, which graph neural network (GNN) method is optimal in terms of performance and time efficiency to generate minimal and meaningful conceptual counterfactual explanations?
### Solutions
To answer this question, the authors carried out the following several tasks:
1. **Graph Representation and Editing**: Represent images as scene graphs (Scene Graphs), and measure the differences between scene graphs of different categories through graph edit distance (Graph Edit Distance, GED).
2. **Application of Graph Machine Learning Algorithms**: Use graph kernel methods (Graph Kernels), graph auto - encoders (Graph Autoencoders, GAEs) and supervised graph neural networks (Supervised GNNs) to accelerate GED calculations and generate counterfactual explanations.
3. **Experiments and Evaluations**: Through quantitative and qualitative experiments, compare the performance of different methods and evaluate their ability to generate minimal and meaningful counterfactual explanations.
### Main Contributions
- Prove that both unsupervised (GAEs) and supervised (GNNs) methods can provide meaningful and accurate conceptual counterfactual explanations in a black - box setting.
- Show the trade - off between unsupervised and supervised methods and provide an analysis of the advantages of each method.
In summary, this paper aims to generate conceptual counterfactual explanations through graph edits to improve the explainability of black - box image classifiers and explore the performance of different graph machine learning algorithms in this task.