Abstract:Counterfactual Explanations (CEs) have emerged as a major paradigm in explainable AI research, providing recourse recommendations for users affected by the decisions of machine learning models. However, when slight changes occur in the parameters of the underlying model, CEs found by existing methods often become invalid for the updated models. The literature lacks a way to certify deterministic robustness guarantees for CEs under model changes, in that existing methods to improve CEs' robustness are heuristic, and the robustness performances are evaluated empirically using only a limited number of retrained models. To bridge this gap, we propose a novel interval abstraction technique for parametric machine learning models, which allows us to obtain provable robustness guarantees of CEs under the possibly infinite set of plausible model changes $\Delta$. We formalise our robustness notion as the $\Delta$-robustness for CEs, in both binary and multi-class classification settings. We formulate procedures to verify $\Delta$-robustness based on Mixed Integer Linear Programming, using which we further propose two algorithms to generate CEs that are $\Delta$-robust. In an extensive empirical study, we demonstrate how our approach can be used in practice by discussing two strategies for determining the appropriate hyperparameter in our method, and we quantitatively benchmark the CEs generated by eleven methods, highlighting the effectiveness of our algorithms in finding robust CEs.

Robust Counterfactual Explanations in Machine Learning: A Survey

Provably Robust and Plausible Counterfactual Explanations for Neural Networks via Robust Optimisation

Interval Abstractions for Robust Counterfactual Explanations

Evaluating Robustness of Counterfactual Explanations

A Survey on the Robustness of Feature Importance and Counterfactual Explanations

Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review

Finding Regions of Counterfactual Explanations via Robust Optimization

Generating robust counterfactual explanations

On the computation of counterfactual explanations -- A survey

Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations

Introducing User Feedback-Based Counterfactual Explanations (UFCE)

On the Robustness of Explanations of Deep Neural Network Models: A Survey

Weak Robust Compatibility Between Learning Algorithms and Counterfactual Explanation Generation Algorithms

Counterfactual Explanations for Machine Learning: Challenges Revisited

Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis

Counterfactual Explanations with Probabilistic Guarantees on their Robustness to Model Change

Flexible and Robust Counterfactual Explanations with Minimal Satisfiable Perturbations

Counterfactual explanations and how to find them: literature review and benchmarking

Verified Training for Counterfactual Explanation Robustness under Data Shift

Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles