Abstract:Despite the recent progress in deep neural networks (DNNs), it remains challenging to explain the predictions made by DNNs. Existing explanation methods for DNNs mainly focus on post-hoc explanations where another explanatory model is employed to provide explanations. The fact that post-hoc methods can fail to reveal the actual original reasoning process of DNNs raises the need to build DNNs with built-in interpretability. Motivated by this, many self-explaining neural networks have been proposed to generate not only accurate predictions but also clear and intuitive insights into why a particular decision was made. However, existing self-explaining networks are limited in providing distribution-free uncertainty quantification for the two simultaneously generated prediction outcomes (i.e., a sample's final prediction and its corresponding explanations for interpreting that prediction). Importantly, they also fail to establish a connection between the confidence values assigned to the generated explanations in the interpretation layer and those allocated to the final predictions in the ultimate prediction layer. To tackle the aforementioned challenges, in this paper, we design a novel uncertainty modeling framework for self-explaining networks, which not only demonstrates strong distribution-free uncertainty modeling performance for the generated explanations in the interpretation layer but also excels in producing efficient and effective prediction sets for the final predictions based on the informative high-level basis explanations. We perform the theoretical analysis for the proposed framework. Extensive experimental evaluation demonstrates the effectiveness of the proposed uncertainty framework.

A Framework for Counterfactual Explanation of Predictive Uncertainty in Multimodal Models

Counterfactual explanation of Bayesian model uncertainty

A Trustworthy Counterfactual Explanation Method With Latent Space Smoothing

Calibrated Explanations: with Uncertainty Information and Counterfactuals

Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations

MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation

A Framework for Feasible Counterfactual Exploration incorporating Causality, Sparsity and Density

Explainability meets uncertainty quantification: Insights from feature-based model fusion on multimodal time series

Getting a CLUE: A Method for Explaining Uncertainty Estimates

CUE: An Uncertainty Interpretation Framework for Text Classifiers Built on Pre-Trained Language Models

Unifying Perspectives: Plausible Counterfactual Explanations on Global, Group-wise, and Local Levels

Introducing User Feedback-Based Counterfactual Explanations (UFCE)

Towards Modeling Uncertainties of Self-explaining Neural Networks via Conformal Prediction

Multi-Objective Counterfactual Explanations

Explainability through uncertainty: Trustworthy decision-making with neural networks

Faithful Counterfactual Visual Explanations (FCVE)

A Counterfactual Explanation Framework for Retrieval Models

Counterfactual Explanation via Search in Gaussian Mixture Distributed Latent Space

Multimodal Explanations: Justifying Decisions and Pointing to the Evidence