Abstract:Recently, several methods have leveraged deep generative modeling to produce example-based explanations of decision algorithms for high-dimensional input data. Despite promising results, a disconnect exists between these methods and the classical explainability literature, which focuses on lower-dimensional data with semantically meaningful features. This conceptual and communication gap leads to misunderstandings and misalignments in goals and expectations. In this paper, we bridge this gap by proposing a novel probabilistic framework for local example-based explanations. Our framework integrates the critical characteristics of classical local explanation desiderata while being amenable to high-dimensional data and their modeling through deep generative models. Our aim is to facilitate communication, foster rigor and transparency, and improve the quality of peer discussion and research progress.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **Bridge the gap between generative models and interpretability, especially generate example - based explanations in high - dimensional data (such as images)**. ### Problem Background With the wide application of deep learning and artificial intelligence in daily life, it is crucial to ensure the trustworthiness, security, and ethical use of these systems. The field of Explainable Artificial Intelligence (XAI) aims to enhance trust and transparency in algorithmic decision - making by providing explanations. One popular method is local explanation, which explains the algorithm's decision for a specific input data point. However, applying explanation methods in low - dimensional data to high - dimensional data (such as images) is not directly feasible because direct modification in the pixel space may lead to image distortion, making it difficult to interpret. ### Core Problems of the Paper The paper points out that there are currently conceptual and communication gaps between generative model and interpretability research, leading to misunderstandings and inconsistent goals. Specifically: 1. **Differences between low - dimensional and high - dimensional data**: Classical interpretability literature mainly focuses on low - dimensional data with semantically meaningful features, while generative models are more often used to process high - dimensional data (such as images). This difference has led to a disconnect between the two. 2. **Fidelity problem of generative models**: When generating explanations in high - dimensional data, how to ensure the fidelity of the generated examples, that is, the generated samples should be as close as possible to the real data distribution rather than simple adversarial examples, is one of the key challenges. ### Solutions To solve the above problems, the paper proposes a new probabilistic framework for generating local example - based explanations. The main contributions of this framework include: 1. **Define three types of explanation samples**: - **Counterfactual Explanations**: Generate a high - fidelity sample that is close to the original sample and changes the algorithm's decision. - **Affirmative Explanations**: Generate a high - fidelity sample that is close to the counterfactual sample and maintains the original decision to re - confirm the user's understanding. - **Adversarial Examples**: Generate a low - fidelity sample that is close to the original sample but changes the algorithm's decision. 2. **Formal Definition**: Formalize the above concepts into mathematical definitions and introduce fidelity as a measurement standard. For example, counterfactual explanations should meet the following conditions: \[ \text{minimize} \quad d(\hat{x}, x^*)+\lambda(f_\theta(\hat{x}) - y_t)^2 \] where \(d(\hat{x}, x^*)\) is the distance between the generated sample and the original sample, \(\lambda\) is a trade - off parameter, \(f_\theta(\hat{x})\) is the predicted label of the generated sample, and \(y_t\) is the target label. 3. **Optimization Problem of Generating Explanations**: Formalize the process of generating explanations as an optimization problem, combining the capabilities of generative models to ensure the fidelity of the generated samples. 4. **Quantitative Evaluation Scheme**: Propose a set of quantitative evaluation metrics, including closeness, validity, and fidelity, to evaluate the quality of the generated explanation samples. ### Experimental Verification The paper verifies the effectiveness of the proposed framework through experiments and shows its application effects on synthetic datasets (such as SportBalls) and real - world datasets (such as CelebA). The experimental results show that only when the three conditions of closeness, validity, and fidelity are met simultaneously can effective counterfactual explanations be generated. In conclusion, this paper aims to bridge the gap between generative models and interpretability by introducing a new probabilistic framework, thereby improving the quality and reliability of high - dimensional data explanations.

Generative Example-Based Explanations: Bridging the Gap between Generative Modeling and Explainability

LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multi-modal Foundation Models

Model Agnostic Multilevel Explanations

GLEAMS: Bridging the Gap Between Local and Global Explanations

On Generating Monolithic and Model Reconciling Explanations in Probabilistic Scenarios

Multi-Level Explanations for Generative Language Models

MBExplainer: Multilevel bandit-based explanations for downstream models with augmented graph embeddings

Local Rule-Based Explanations of Black Box Decision Systems

Explaining Deep Convolutional Neural Networks for Image Classification by Evolving Local Interpretable Model-agnostic Explanations

Global Explainability of GNNs via Logic Combination of Learned Concepts

Unifying Perspectives: Plausible Counterfactual Explanations on Global, Group-wise, and Local Levels

Explanation as a process: user-centric construction of multi-level and multi-modal explanations

Fast Explainability via Feasible Concept Sets Generator

Towards Interpretable Natural Language Understanding with Explanations As Latent Variables

Explainability for Machine Learning Models: From Data Adaptability to User Perception

Causality-Aware Local Interpretable Model-Agnostic Explanations

From Latent to Lucid: Transforming Knowledge Graph Embeddings into Interpretable Structures

Studying Limits of Explainability by Integrated Gradients for Gene Expression Models

Natural Example-Based Explainability: a Survey

Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?

Accurate and Intuitive Contextual Explanations using Linear Model Trees