Abstract:With the widespread application of Deep Learning (DL), the black-box characteristics of DL raise questions, especially in high-stake decision-making fields like autonomous driving. Consequently, there is a growing demand for research on the interpretability of DL, leading to the emergence of eXplainable Artificial Intelligence as a current research hotspot. Current research on DL interpretability primarily focuses on transparency and post-hoc interpretability. Enhancing interpretability in transparency often requires targeted modifications to the model structure, potentially compromising the model's accuracy. Conversely, improving the interpretability of DL models based on post-hoc interpretability usually does not necessitate adjustments to the model itself. To provide a fast and accurate counterfactual explanation of DL without compromising its performance, this paper proposes a post-hoc interpretation method called relevance inference based on direct contribution to employ counterfactual reasoning in DL. In this method, direct contribution is first designed by improving Layer-wise Relevance Propagation to measure the relevance between the outputs and the inputs. Subsequently, we produce counterfactual examples based on direct contribution. Ultimately, counterfactual results for the DL model are obtained with these counterfactual examples. These counterfactual results effectively describe the behavioral boundaries of the model, facilitating a better understanding of its behavior. Additionally, direct contribution offers an easily implementable interpretable analysis method for studying model behavior. Experiments conducted on various datasets demonstrate that relevance inference can be more efficiently and accurately generate counterfactual examples compared to the state-of-the-art methods, aiding in the analysis of behavioral boundaries in intelligent decision-making models for vehicles.

An Interpretable Deep Classifier for Counterfactual Generation.

Interactive Counterfactual Generation for Univariate Time Series

Understanding Counterfactual Generation Using Maximum Mean Discrepancy.

Relevance Inference Based on Direct Contribution: Counterfactual Explanation to Deep Networks for Intelligent Decision-making

Interpretable Credit Application Predictions With Counterfactual Explanations

Viewing the process of generating counterfactuals as a source of knowledge: a new approach for explaining classifiers

Causality-based Counterfactual Explanation for Classification Models

ViCE: Visual Counterfactual Explanations for Machine Learning Models

Generating Counterfactual Explanations with Natural Language

Causal Generative Explainers using Counterfactual Inference: A Case Study on the Morpho-MNIST Dataset

Counterfactual Generative Modeling with Variational Causal Inference

Counterfactual Generation with Identifiability Guarantees

An Interpretable Deep Bayesian Model for Facial Micro-Expression Recognition

Latent-CF: A Simple Baseline for Reverse Counterfactual Explanations

The Gaussian Discriminant Variational Autoencoder (GdVAE): A Self-Explainable Model with Counterfactual Explanations

Multi-round Counterfactual Generation: Interpreting and Improving Models of Text Classification.

Generating Plausible Counterfactual Explanations for Deep Transformers in Financial Text Classification

Self-Interpretable Time Series Prediction with Counterfactual Explanations

Explaining Text Classifiers with Counterfactual Representations

TABCF: Counterfactual Explanations for Tabular Data Using a Transformer-Based VAE