Wolfgang Stammer,Felix Friedrich,David Steinmann,Manuel Brack,Hikaru Shindo,Kristian Kersting
Abstract:Much of explainable AI research treats explanations as a means for model inspection. Yet, this neglects findings from human psychology that describe the benefit of self-explanations in an agent's learning process. Motivated by this, we introduce a novel workflow in the context of image classification, termed Learning by Self-Explaining (LSX). LSX utilizes aspects of self-refining AI and human-guided explanatory machine learning. The underlying idea is that a learner model, in addition to optimizing for the original predictive task, is further optimized based on explanatory feedback from an internal critic model. Intuitively, a learner's explanations are considered "useful" if the internal critic can perform the same task given these explanations. We provide an overview of important components of LSX and, based on this, perform extensive experimental evaluations via three different example instantiations. Our results indicate improvements via Learning by Self-Explaining on several levels: in terms of model generalization, reducing the influence of confounding factors, and providing more task-relevant and faithful model explanations. Overall, our work provides evidence for the potential of self-explaining within the learning phase of an AI model.
What problem does this paper attempt to address?
### Problems the paper attempts to solve
This paper attempts to solve the following problems by introducing a new workflow - Learning by Self - Explaining (LSX):
1. **Model generalization ability**: Existing Explainable AI (XAI) research mainly focuses on explanation as a means of model inspection, ignoring the benefits of self - explanation in the learning process in human psychology. LSX aims to improve the generalization ability of the model through self - explanation, making it perform better on unseen data.
2. **Reducing the influence of confounding factors**: In the training dataset, the model may learn some spurious correlations (confounding factors) that do not exist in the test dataset. LSX identifies and reduces the influence of these confounding factors through self - explanation, thereby improving the robustness and reliability of the model.
3. **Providing more relevant and faithful explanations**: Existing explanation methods may not be able to capture the key factors of model decisions well. LSX evaluates the quality of explanations through an internal critic model, ensuring that the explanations provided by the model are relevant to the task and faithful to its decision - making process.
### Specific implementation
The LSX workflow includes four core modules:
1. **Fit**: The learner model is first optimized on the base task, such as image classification. The goal of this stage is to make the learner model achieve good performance on the base task.
2. **Explain**: The learner model provides explanations for its predictions. These explanations can be input attributions, logical statements, or natural - language explanations, etc.
3. **Reflect**: The internal critic model evaluates the quality of the explanations provided by the learner model. Specifically, the critic model attempts to use these explanations to complete the base task and gives feedback based on its performance.
4. **Revise**: The learner model adjusts according to the feedback from the critic model to improve its explanation quality and prediction performance.
### Experimental verification
The paper verifies the effectiveness of LSX through multiple experiments, including:
- **Generalization ability on different datasets**: On datasets such as MNIST, ChestMNIST, CLEVR - Hans3, and CUB - 10, LSX significantly improves the generalization performance of the model under different training set sizes.
- **Reducing the influence of confounding factors**: On the decoy versions of the DecoyMNIST and CLEVR - Hans3 datasets, LSX can better identify and reduce the influence of confounding factors.
- **Consistency and faithfulness of explanations**: Through indicators such as cluster analysis and linear model classification accuracy, it is verified that the explanations generated by LSX are more consistent and faithful to the model's decision - making process.
### Summary
In general, by introducing LSX, this paper shows the potential of self - explanation in improving model generalization ability, reducing the influence of confounding factors, and providing more relevant and faithful explanations. These improvements not only help to improve the performance of the model, but also provide a new perspective for understanding the model's decision - making process.