Abstract:Counterfactual explanation generation is a powerful method for Explainable Artificial Intelligence. It can help users understand why machine learning models make specific decisions, and how to change those decisions. Evaluating the robustness of counterfactual explanation algorithms is therefore crucial. Previous literature has widely studied the robustness based on the perturbation of input instances. However, the robustness defined from the perspective of perturbed instances is sometimes biased, because this definition ignores the impact of learning algorithms on robustness. In this paper, we propose a more reasonable definition, Weak Robust Compatibility, based on the perspective of explanation strength. In practice, we propose WRC-Test to help us generate more robust counterfactuals. Meanwhile, we designed experiments to verify the effectiveness of WRC-Test. Theoretically, we introduce the concepts of PAC learning theory and define the concept of PAC WRC-Approximability. Based on reasonable assumptions, we establish oracle inequalities about weak robustness, which gives a sufficient condition for PAC WRC-Approximability.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to systematically evaluate the robustness of counterfactual explanations generated between the Counterfactual Explanation Generation Algorithm (CEGA) and the Learning Algorithm (LA). Specifically, existing research mainly defines robustness from the perspective of input instance perturbation, but this definition is sometimes biased because it ignores the influence of the learning algorithm on robustness. Therefore, this paper proposes a new definition based on the strength of explanations - Weak Robust Compatibility (WRC) - to more reasonably evaluate the robustness of counterfactual explanations.
### Main contributions of the paper
1. **Conceptually**: In Section 4, a new concept - Weak Robust Compatibility (WRC) - is proposed. This concept takes into account the characteristics of CEGA and LA and is more reasonable than previous views in the literature.
2. **In application**: In Section 4, the WRC - Test is proposed to help generate more robust counterfactual explanations, and in Section 6, the effectiveness of the WRC - Test is verified through experiments.
3. **Theoretically**: In Section 5, the concept of PAC WRC - Approximability is introduced, and based on reasonable assumptions, an oracle inequality regarding weak robustness is established, providing sufficient conditions for PAC WRC - Approximability. This result deepens our understanding of robustness in the field of counterfactual explanations.
### Main technical details
#### 4.1 Mathematical formulation of the Counterfactual Explanation Generation Algorithm (CEGA)
Definition 1 (CEGA): Given a hypothesis space \(H\) and an instance space \(X\), define CEGA \(C\) as a mapping:
\[C: H\times X\rightarrow C, (h, x)\mapsto C(h, x)\]
This mapping maps instance \(x\) to the counterfactual explanation relative to classifier \(h\).
Definition 2 (CEGA induced by distance function \(d(\cdot,\cdot)\)): Given a hypothesis space \(H\), an instance space \(X\) and a distance function \(d(\cdot,\cdot)\), define the CEGA \(C_d\) induced by \(d(\cdot,\cdot)\) as:
\[C_d: H\times X\rightarrow X, (h, x)\mapsto\arg\min_{x'\in N(x)}d(x', x)\]
where \(z=(x, y)\), \(N(x)=\{y\in X: h(x)\neq h(y)\}\).
#### 4.2 Strong Robust Compatibility (SRC) and Weak Robust Compatibility (WRC)
Definition 3 (Strong Robust Compatibility, SRC): Consider the instance space \((X, d)\), where \(X\) is a subset of the \(K\)-dimensional Euclidean space \(\mathbb{R}^K\), a learning algorithm \(L\), a CEGA \(C\), and a function \(\varphi\in\Phi\). Let \(D_T\) be a data set of size \(T\) drawn from the distribution \(D^{\otimes T}\), and \(L(D_T) = h_T\). Define the strong robust compatibility (SRC) of \(C\) relative to \(h_T\) at \(x\) as:
\[ \text{SRC}_\varphi^x(C, h_T)=\int_{X\cap B(x, r)}\Delta(x, y)\cdot\varphi(d(x, y))\, dy\]
where \(B(x, r)\) represents a hypersphere centered at \(x\) with radius \(r\), and
\[ \Delta(x, y)=d(C(h_T, x), C(h_T, y))\]
Definition 4 (Weak Robust Compatibility, WRC): Consider