Abstract:This work presents CounterNet, a novel end-to-end learning framework which integrates Machine Learning (ML) model training and the generation of corresponding counterfactual (CF) explanations into a single end-to-end pipeline. Counterfactual explanations offer a contrastive case, i.e., they attempt to find the smallest modification to the feature values of an instance that changes the prediction of the ML model on that instance to a predefined output. Prior techniques for generating CF explanations suffer from two major limitations: (i) all of them are post-hoc methods designed for use with proprietary ML models -- as a result, their procedure for generating CF explanations is uninformed by the training of the ML model, which leads to misalignment between model predictions and explanations; and (ii) most of them rely on solving separate time-intensive optimization problems to find CF explanations for each input data point (which negatively impacts their runtime). This work makes a novel departure from the prevalent post-hoc paradigm (of generating CF explanations) by presenting CounterNet, an end-to-end learning framework which integrates predictive model training and the generation of counterfactual (CF) explanations into a single pipeline. Unlike post-hoc methods, CounterNet enables the optimization of the CF explanation generation only once together with the predictive model. We adopt a block-wise coordinate descent procedure which helps in effectively training CounterNet's network. Our extensive experiments on multiple real-world datasets show that CounterNet generates high-quality predictions, and consistently achieves 100% CF validity and low proximity scores (thereby achieving a well-balanced cost-invalidity trade-off) for any new input instance, and runs 3X faster than existing state-of-the-art baselines.

CounterNet: End-to-End Training of Prediction Aware Counterfactual Explanations

Generating Counterfactual Explanations with Natural Language

Causality-based Counterfactual Explanation for Classification Models

Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals

Latent-CF: A Simple Baseline for Reverse Counterfactual Explanations

Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations

Counterfactual Explanations in Explainable AI: A Tutorial

Counterfactual Explanations for Machine Learning: Challenges Revisited

Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis

CoLa-DCE -- Concept-guided Latent Diffusion Counterfactual Explanations

Counterfactual Explanation Generation with Minimal Feature Boundary

Robust Counterfactual Explanations on Graph Neural Networks.

Counterfactual Explanations for Time Series Forecasting

Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review

CUBE: Causal Intervention-Based Counterfactual Explanation for Prediction Models

Counterfactual Explanations via Locally-guided Sequential Algorithmic Recourse

MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation

CLEAR: Generative Counterfactual Explanations on Graphs

KACE: Generating Knowledge-Aware Contrastive Explanations for Natural Language Inference

Multi-Objective Counterfactual Explanations

Semi-supervised counterfactual explanations