Relevant Irrelevance: Generating Alterfactual Explanations for Image Classifiers

Silvan Mertes,Tobias Huber,Christina Karle,Katharina Weitz,Ruben Schlagowski,Cristina Conati,Elisabeth André

2024-05-08

Abstract:In this paper, we demonstrate the feasibility of alterfactual explanations for black box image classifiers. Traditional explanation mechanisms from the field of Counterfactual Thinking are a widely-used paradigm for Explainable Artificial Intelligence (XAI), as they follow a natural way of reasoning that humans are familiar with. However, most common approaches from this field are based on communicating information about features or characteristics that are especially important for an AI's decision. However, to fully understand a decision, not only knowledge about relevant features is needed, but the awareness of irrelevant information also highly contributes to the creation of a user's mental model of an AI system. To this end, a novel approach for explaining AI systems called alterfactual explanations was recently proposed on a conceptual level. It is based on showing an alternative reality where irrelevant features of an AI's input are altered. By doing so, the user directly sees which input data characteristics can change arbitrarily without influencing the AI's decision. In this paper, we show for the first time that it is possible to apply this idea to black box models based on neural networks. To this end, we present a GAN-based approach to generate these alterfactual explanations for binary image classifiers. Further, we present a user study that gives interesting insights on how alterfactual explanations can complement counterfactual explanations.

Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to enhance users' understanding of the decisions made by black - box image classifiers by generating "alterfactual explanations". Traditionally, Explainable Artificial Intelligence (XAI) methods have mainly focused on showing which input features are particularly important for AI's decisions, namely "counterfactual explanations". However, these methods ignore the information of irrelevant features, and this information is also crucial for users to fully understand the decision - making process of AI systems. Therefore, this article proposes a new explanatory paradigm - "alterfactual explanations", aiming to improve users' complete understanding of the decision - making domain of AI systems by changing the irrelevant features in the input data and showing users that these changes will not affect AI's decisions. Specifically, the paper proposes a method based on Generative Adversarial Networks (GAN) to generate alterfactual and counterfactual explanations for binary image classifiers. In this way, researchers can not only show which features are crucial for decision - making, but also point out which features are irrelevant, thus providing a more comprehensive explanation. In addition, the paper also verifies the effectiveness of alterfactual explanations through user studies, and finds that they are superior to counterfactual explanations in terms of local model understanding, and when used in combination with counterfactual explanations, they can more effectively help users identify relevant and irrelevant features.

Relevant Irrelevance: Generating Alterfactual Explanations for Image Classifiers

Alterfactual Explanations -- The Relevance of Irrelevance for Explaining AI Systems

Generating Counterfactual Explanations with Natural Language

On Generating Plausible Counterfactual and Semi-Factual Explanations for Deep Learning

Causal Generative Explainers using Counterfactual Inference: A Case Study on the Morpho-MNIST Dataset

NoMatterXAI: Generating "No Matter What" Alterfactual Examples for Explaining Black-Box Text Classification Models

Causal Explanations for Image Classifiers

Counterfactual Image Generation for adversarially robust and interpretable Classifiers

Foiling Explanations in Deep Neural Networks

Making Heads or Tails: Towards Semantically Consistent Visual Counterfactuals

TDLS: A Top-Down Layer Searching Algorithm for Generating Counterfactual Visual Explanation

Visual Explanations with Attributions and Counterfactuals on Time Series Classification

Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations

Accurate Explanation Model for Image Classifiers using Class Association Embedding

Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis

Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review

Viewing the process of generating counterfactuals as a source of knowledge: a new approach for explaining classifiers

Explainable AI without Interpretable Model

DiG-IN: Diffusion Guidance for Investigating Networks -- Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations

Explaining the Black-box Smoothly- A Counterfactual Approach