Abstract:Counterfactual explanation generation is a powerful method for Explainable Artificial Intelligence. It can help users understand why machine learning models make specific decisions, and how to change those decisions. Evaluating the robustness of counterfactual explanation algorithms is therefore crucial. Previous literature has widely studied the robustness based on the perturbation of input instances. However, the robustness defined from the perspective of perturbed instances is sometimes biased, because this definition ignores the impact of learning algorithms on robustness. In this paper, we propose a more reasonable definition, Weak Robust Compatibility, based on the perspective of explanation strength. In practice, we propose WRC-Test to help us generate more robust counterfactuals. Meanwhile, we designed experiments to verify the effectiveness of WRC-Test. Theoretically, we introduce the concepts of PAC learning theory and define the concept of PAC WRC-Approximability. Based on reasonable assumptions, we establish oracle inequalities about weak robustness, which gives a sufficient condition for PAC WRC-Approximability.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to systematically evaluate the robustness of counterfactual explanations generated between the Counterfactual Explanation Generation Algorithm (CEGA) and the Learning Algorithm (LA). Specifically, existing research mainly defines robustness from the perspective of input instance perturbation, but this definition is sometimes biased because it ignores the influence of the learning algorithm on robustness. Therefore, this paper proposes a new definition based on the strength of explanations - Weak Robust Compatibility (WRC) - to more reasonably evaluate the robustness of counterfactual explanations. ### Main contributions of the paper 1. **Conceptually**: In Section 4, a new concept - Weak Robust Compatibility (WRC) - is proposed. This concept takes into account the characteristics of CEGA and LA and is more reasonable than previous views in the literature. 2. **In application**: In Section 4, the WRC - Test is proposed to help generate more robust counterfactual explanations, and in Section 6, the effectiveness of the WRC - Test is verified through experiments. 3. **Theoretically**: In Section 5, the concept of PAC WRC - Approximability is introduced, and based on reasonable assumptions, an oracle inequality regarding weak robustness is established, providing sufficient conditions for PAC WRC - Approximability. This result deepens our understanding of robustness in the field of counterfactual explanations. ### Main technical details #### 4.1 Mathematical formulation of the Counterfactual Explanation Generation Algorithm (CEGA) Definition 1 (CEGA): Given a hypothesis space \(H\) and an instance space \(X\), define CEGA \(C\) as a mapping: \[C: H\times X\rightarrow C, (h, x)\mapsto C(h, x)\] This mapping maps instance \(x\) to the counterfactual explanation relative to classifier \(h\). Definition 2 (CEGA induced by distance function \(d(\cdot,\cdot)\)): Given a hypothesis space \(H\), an instance space \(X\) and a distance function \(d(\cdot,\cdot)\), define the CEGA \(C_d\) induced by \(d(\cdot,\cdot)\) as: \[C_d: H\times X\rightarrow X, (h, x)\mapsto\arg\min_{x'\in N(x)}d(x', x)\] where \(z=(x, y)\), \(N(x)=\{y\in X: h(x)\neq h(y)\}\). #### 4.2 Strong Robust Compatibility (SRC) and Weak Robust Compatibility (WRC) Definition 3 (Strong Robust Compatibility, SRC): Consider the instance space \((X, d)\), where \(X\) is a subset of the \(K\)-dimensional Euclidean space \(\mathbb{R}^K\), a learning algorithm \(L\), a CEGA \(C\), and a function \(\varphi\in\Phi\). Let \(D_T\) be a data set of size \(T\) drawn from the distribution \(D^{\otimes T}\), and \(L(D_T) = h_T\). Define the strong robust compatibility (SRC) of \(C\) relative to \(h_T\) at \(x\) as: \[ \text{SRC}_\varphi^x(C, h_T)=\int_{X\cap B(x, r)}\Delta(x, y)\cdot\varphi(d(x, y))\, dy\] where \(B(x, r)\) represents a hypersphere centered at \(x\) with radius \(r\), and \[ \Delta(x, y)=d(C(h_T, x), C(h_T, y))\] Definition 4 (Weak Robust Compatibility, WRC): Consider

Weak Robust Compatibility Between Learning Algorithms and Counterfactual Explanation Generation Algorithms

Evaluating Robustness of Counterfactual Explanations

Robust Counterfactual Explanations in Machine Learning: A Survey

Generating robust counterfactual explanations

Provably Robust and Plausible Counterfactual Explanations for Neural Networks via Robust Optimisation

Finding Regions of Counterfactual Explanations via Robust Optimization

Generally-Occurring Model Change for Robust Counterfactual Explanations

Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations

Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles

Density-based reliable and robust explainer for counterfactual explanation

Interval Abstractions for Robust Counterfactual Explanations

From Robustness to Explainability and Back Again

Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis

Counterfactual Explanations with Probabilistic Guarantees on their Robustness to Model Change

Robust Counterfactual Explanations for Tree-Based Ensembles

Verified Training for Counterfactual Explanation Robustness under Data Shift

Combination of Weak Learners eXplanations to Improve Random Forest eXplicability Robustness

A Few Good Counterfactuals: Generating Interpretable, Plausible and Diverse Counterfactual Explanations

Can you trust your explanations? A robustness test for feature attribution methods

Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees

Counterfactual Explanation and Causal Inference in Service of Robustness in Robot Control