Relevant Irrelevance: Generating Alterfactual Explanations for Image Classifiers

Silvan Mertes,Tobias Huber,Christina Karle,Katharina Weitz,Ruben Schlagowski,Cristina Conati,Elisabeth André
2024-05-08
Abstract:In this paper, we demonstrate the feasibility of alterfactual explanations for black box image classifiers. Traditional explanation mechanisms from the field of Counterfactual Thinking are a widely-used paradigm for Explainable Artificial Intelligence (XAI), as they follow a natural way of reasoning that humans are familiar with. However, most common approaches from this field are based on communicating information about features or characteristics that are especially important for an AI's decision. However, to fully understand a decision, not only knowledge about relevant features is needed, but the awareness of irrelevant information also highly contributes to the creation of a user's mental model of an AI system. To this end, a novel approach for explaining AI systems called alterfactual explanations was recently proposed on a conceptual level. It is based on showing an alternative reality where irrelevant features of an AI's input are altered. By doing so, the user directly sees which input data characteristics can change arbitrarily without influencing the AI's decision. In this paper, we show for the first time that it is possible to apply this idea to black box models based on neural networks. To this end, we present a GAN-based approach to generate these alterfactual explanations for binary image classifiers. Further, we present a user study that gives interesting insights on how alterfactual explanations can complement counterfactual explanations.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to enhance users' understanding of the decisions made by black - box image classifiers by generating "alterfactual explanations". Traditionally, Explainable Artificial Intelligence (XAI) methods have mainly focused on showing which input features are particularly important for AI's decisions, namely "counterfactual explanations". However, these methods ignore the information of irrelevant features, and this information is also crucial for users to fully understand the decision - making process of AI systems. Therefore, this article proposes a new explanatory paradigm - "alterfactual explanations", aiming to improve users' complete understanding of the decision - making domain of AI systems by changing the irrelevant features in the input data and showing users that these changes will not affect AI's decisions. Specifically, the paper proposes a method based on Generative Adversarial Networks (GAN) to generate alterfactual and counterfactual explanations for binary image classifiers. In this way, researchers can not only show which features are crucial for decision - making, but also point out which features are irrelevant, thus providing a more comprehensive explanation. In addition, the paper also verifies the effectiveness of alterfactual explanations through user studies, and finds that they are superior to counterfactual explanations in terms of local model understanding, and when used in combination with counterfactual explanations, they can more effectively help users identify relevant and irrelevant features.