Abstract:Counterfactual explanations are increasingly used as an Explainable Artificial Intelligence (XAI) technique to provide stakeholders of complex machine learning algorithms with explanations for data-driven decisions. The popularity of counterfactual explanations resulted in a boom in the algorithms generating them. However, not every algorithm creates uniform explanations for the same instance. Even though in some contexts multiple possible explanations are beneficial, there are circumstances where diversity amongst counterfactual explanations results in a potential disagreement problem among stakeholders. Ethical issues arise when for example, malicious agents use this diversity to fairwash an unfair machine learning model by hiding sensitive features. As legislators worldwide tend to start including the right to explanations for data-driven, high-stakes decisions in their policies, these ethical issues should be understood and addressed. Our literature review on the disagreement problem in XAI reveals that this problem has never been empirically assessed for counterfactual explanations. Therefore, in this work, we conduct a large-scale empirical analysis, on 40 data sets, using 12 explanation-generating methods, for two black-box models, yielding over 192,000 explanations. Our study finds alarmingly high disagreement levels between the methods tested. A malicious user is able to both exclude and include desired features when multiple counterfactual explanations are available. This disagreement seems to be driven mainly by the data set characteristics and the type of counterfactual algorithm. XAI centers on the transparency of algorithmic decision-making, but our analysis advocates for transparency about this self-proclaimed transparency.

Do not explain without context: addressing the blind spot of model explanations

Error Analysis of Shapley Value-Based Model Explanations: An Informative Perspective

Unified Explanations in Machine Learning Models: A Perturbation Approach

Model Interpretation and Explainability: Towards Creating Transparency in Prediction Models

Quantifying Explainability in Outcome-Oriented Predictive Process Monitoring

Position: Do Not Explain Vision Models Without Context

Explaining Explanations: An Overview of Interpretability of Machine Learning

From Model Explanation to Data Misinterpretation: Uncovering the Pitfalls of Post Hoc Explainers in Business Research

Insights into Data through Model Behaviour: An Explainability-driven Strategy for Data Auditing for Responsible Computer Vision Applications

Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models

Are Visual Explanations Useful? A Case Study in Model-in-the-Loop Prediction

Interpretable and explainable machine learning: A methods‐centric overview with concrete examples

Disagreement amongst counterfactual explanations: how transparency can be misleading

A Comparative Analysis of Model Agnostic Techniques for Explainable Artificial Intelligence

Explaining Explanations in AI

XAudit : A Theoretical Look at Auditing with Explanations

Respect the model: Fine-grained and Robust Explanation with Sharing Ratio Decomposition

Teaching Meaningful Explanations

Comprehension Is a Double-Edged Sword: Over-Interpreting Unspecified Information in Intelligible Machine Learning Explanations

On Minimizing the Impact of Dataset Shifts on Actionable Explanations

Disagreement amongst counterfactual explanations: How transparency can be deceptive