Abstract:This work presents CounterNet, a novel end-to-end learning framework which integrates Machine Learning (ML) model training and the generation of corresponding counterfactual (CF) explanations into a single end-to-end pipeline. Counterfactual explanations offer a contrastive case, i.e., they attempt to find the smallest modification to the feature values of an instance that changes the prediction of the ML model on that instance to a predefined output. Prior techniques for generating CF explanations suffer from two major limitations: (i) all of them are post-hoc methods designed for use with proprietary ML models -- as a result, their procedure for generating CF explanations is uninformed by the training of the ML model, which leads to misalignment between model predictions and explanations; and (ii) most of them rely on solving separate time-intensive optimization problems to find CF explanations for each input data point (which negatively impacts their runtime). This work makes a novel departure from the prevalent post-hoc paradigm (of generating CF explanations) by presenting CounterNet, an end-to-end learning framework which integrates predictive model training and the generation of counterfactual (CF) explanations into a single pipeline. Unlike post-hoc methods, CounterNet enables the optimization of the CF explanation generation only once together with the predictive model. We adopt a block-wise coordinate descent procedure which helps in effectively training CounterNet's network. Our extensive experiments on multiple real-world datasets show that CounterNet generates high-quality predictions, and consistently achieves 100% CF validity and low proximity scores (thereby achieving a well-balanced cost-invalidity trade-off) for any new input instance, and runs 3X faster than existing state-of-the-art baselines.

Multi-round Counterfactual Generation: Interpreting and Improving Models of Text Classification.

A Comparative Analysis of Counterfactual Explanation Methods for Text Classifiers

Counterfactual Generation with Identifiability Guarantees

A Survey on Natural Language Counterfactual Generation

Generating Counterfactual Explanations with Natural Language

Multi-Objective Counterfactual Explanations

Empowering Language Understanding with Counterfactual Reasoning

Text Counterfactuals via Latent Optimization and Shapley-Guided Search

Interactive Counterfactual Generation for Univariate Time Series

Optimal and efficient text counterfactuals using Graph Neural Networks

A General Search-based Framework for Generating Textual Counterfactual Explanations

Model-Based Counterfactual Synthesizer for Interpretation.

Causality-based Counterfactual Explanation for Classification Models

Exploring the Efficacy of Automatically Generated Counterfactuals for Sentiment Analysis.

An Interpretable Deep Classifier for Counterfactual Generation.

Automatic Counterfactual Augmentation for Robust Text Classification Based on Word-Group Search

Explaining Text Classifiers with Counterfactual Representations

TIGTEC : Token Importance Guided TExt Counterfactuals

CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples

CounterNet: End-to-End Training of Prediction Aware Counterfactual Explanations

Counterfactual Explanation Generation with Minimal Feature Boundary