Abstract:Counterfactual explanations are increasingly used as an Explainable Artificial Intelligence (XAI) technique to provide stakeholders of complex machine learning algorithms with explanations for data-driven decisions. The popularity of counterfactual explanations resulted in a boom in the algorithms generating them. However, not every algorithm creates uniform explanations for the same instance. Even though in some contexts multiple possible explanations are beneficial, there are circumstances where diversity amongst counterfactual explanations results in a potential disagreement problem among stakeholders. Ethical issues arise when for example, malicious agents use this diversity to fairwash an unfair machine learning model by hiding sensitive features. As legislators worldwide tend to start including the right to explanations for data-driven, high-stakes decisions in their policies, these ethical issues should be understood and addressed. Our literature review on the disagreement problem in XAI reveals that this problem has never been empirically assessed for counterfactual explanations. Therefore, in this work, we conduct a large-scale empirical analysis, on 40 datasets, using 12 explanation-generating methods, for two black-box models, yielding over 192.0000 explanations. Our study finds alarmingly high disagreement levels between the methods tested. A malicious user is able to both exclude and include desired features when multiple counterfactual explanations are available. This disagreement seems to be driven mainly by the dataset characteristics and the type of counterfactual algorithm. XAI centers on the transparency of algorithmic decision-making, but our analysis advocates for transparency about this self-proclaimed transparency

Deceptive XAI: Typology, Creation and Detection

Deceptive AI Explanations: Creation and Detection

Explainable AI for Cheating Detection and Churn Prediction in Online Games

"How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations

Why do explanations fail? A typology and discussion on failures in XAI

Disagreement amongst counterfactual explanations: How transparency can be deceptive

Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations

Unraveling the Dilemma of AI Errors: Exploring the Effectiveness of Human and Machine Explanations for Large Language Models

A Turing Test for Transparency

Don't be Fooled: The Misinformation Effect of Explanations in Human-AI Collaboration

Deceptive AI systems that give explanations are more convincing than honest AI systems and can amplify belief in misinformation

Deliberative XAI: How Explanations Impact Understanding and Decision-Making of AI Novices in Collective and Individual Settings

Explaining Explanations: An Overview of Interpretability of Machine Learning

SIDEs: Separating Idealization from Deceptive Explanations in xAI

Deception and Manipulation in Generative AI

X Hacking: The Threat of Misguided AutoML

Explaining Any ML Model? -- On Goals and Capabilities of XAI

Transparency and Trust in Human-AI-Interaction: The Role of Model-Agnostic Explanations in Computer Vision-Based Decision Support

Fool Me Once? Contrasting Textual and Visual Explanations in a Clinical Decision-Support Setting

Axe the X in XAI: A Plea for Understandable AI

Can Explainable AI Explain Unfairness? A Framework for Evaluating Explainable AI