Abstract:Counterfactual explanations have become a mainstay of the XAI field. This particularly intuitive statement allows the user to understand what small but necessary changes would have to be made to a given situation in order to change a model prediction. The quality of a counterfactual depends on several criteria: realism, actionability, validity, robustness, etc. In this paper, we are interested in the notion of robustness of a counterfactual. More precisely, we focus on robustness to counterfactual input changes. This form of robustness is particularly challenging as it involves a trade-off between the robustness of the counterfactual and the proximity with the example to explain. We propose a new framework, CROCO, that generates robust counterfactuals while managing effectively this trade-off, and guarantees the user a minimal robustness. An empirical evaluation on tabular datasets confirms the relevance and effectiveness of our approach.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to improve the robustness to input perturbations when generating counterfactual explanations. Specifically, the author is concerned with whether these explanations are still valid when making minor changes to the counterfactual explanations. This form of robustness is particularly important because it involves a trade - off between the effectiveness of counterfactual explanations and their closeness to the original instance. The paper proposes a new framework - CROCO (Cost - efficient Robust Counterfactuals), aiming to generate robust counterfactual explanations, effectively manage this trade - off, and ensure that users obtain a minimum level of robustness. ### Background and Motivation With the increasing application of machine - learning models in key decision - making areas such as healthcare, recruitment, and credit allocation, it has become crucial to provide explanations for individual decisions. The counterfactual explanations proposed by Wachter et al. is an intuitive method. By showing what small and necessary changes need to be made to a given situation to change the model prediction, it enables users to understand the decision - making process of the model. The quality of counterfactual explanations depends on multiple criteria, such as realism, operability, effectiveness, robustness, etc. Among them, robustness refers to the ability of counterfactual explanations to remain valid in the face of input perturbations. ### Research Questions This paper specifically focuses on the robustness of counterfactual explanations, especially the robustness to input perturbations. This robustness is very challenging because it requires finding a balance point between robustness and closeness to the example to be explained. Although existing methods (such as PROBE) attempt to solve this problem, they have some limitations. For example, the guarantee of robustness depends on the quality of the estimator, and in practical applications, it may lead to insufficient robustness or excessive distance. ### Solutions To solve the above problems, the author proposes the CROCO framework. CROCO is based on a new optimization problem and introduces the concept of "soft recourse invalidation rate" and its estimator. Through this method, CROCO can derive an almost certain probability upper bound of the true recourse invalidation rate, thereby ensuring that the recourse invalidation rate of the solution obtained by the user is lower than the predetermined target value. Experimental results show that CROCO outperforms existing methods on different tabular datasets, especially in terms of the trade - off between robustness and closeness. ### Experimental Verification To verify the effectiveness of CROCO, the author conducted experiments on three different datasets (Adult, COMPAS, GSC). The experimental results show that CROCO not only performs excellently in terms of robustness but also outperforms existing methods such as PROBE and Wachter in terms of closeness. In particular, CROCO shows higher stability and robustness when dealing with datasets containing a large number of categorical variables. ### Conclusions This paper solves the robustness problem of counterfactual explanations in the face of input perturbations by proposing the CROCO framework. CROCO not only has strict theoretical guarantees but also shows excellent performance in practical applications, providing a new and effective method for generating robust counterfactual explanations.

Generating robust counterfactual explanations

Evaluating Robustness of Counterfactual Explanations

A Few Good Counterfactuals: Generating Interpretable, Plausible and Diverse Counterfactual Explanations

Robust Counterfactual Explanations in Machine Learning: A Survey

Robust Counterfactual Explanations for Tree-Based Ensembles

Counterfactual explanations and how to find them: literature review and benchmarking

Good Counterfactuals and Where to Find Them: A Case-Based Technique for Generating Counterfactuals for Explainable AI (XAI)

Generally-Occurring Model Change for Robust Counterfactual Explanations

Weak Robust Compatibility Between Learning Algorithms and Counterfactual Explanation Generation Algorithms

Finding Regions of Counterfactual Explanations via Robust Optimization

Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black Box

Generating Counterfactual Explanations Using Cardinality Constraints

Counterfactual Explanation and Causal Inference in Service of Robustness in Robot Control

Multi-Objective Counterfactual Explanations

Counterfactual Explanations via Locally-guided Sequential Algorithmic Recourse

Density-based reliable and robust explainer for counterfactual explanation

Generating Counterfactual Explanations with Natural Language

Choose your Data Wisely: A Framework for Semantic Counterfactuals

Convex optimization for actionable \& plausible counterfactual explanations

Counterfactual Explanations with Probabilistic Guarantees on their Robustness to Model Change

Generating Feasible and Plausible Counterfactual Explanations for Outcome Prediction of Business Processes