Abstract:As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of if and when the explanations output by these methods disagree with each other, and how such disagreements are resolved in practice. However, there is little to no research that provides answers to these critical questions. In this work, we introduce and study the disagreement problem in explainable machine learning. More specifically, we formalize the notion of disagreement between explanations, analyze how often such disagreements occur in practice, and how practitioners resolve these disagreements. We first conduct interviews with data scientists to understand what constitutes disagreement between explanations generated by different methods for the same model prediction and introduce a novel quantitative framework to formalize this understanding. We then leverage this framework to carry out a rigorous empirical analysis with four real-world datasets, six state-of-the-art post hoc explanation methods, and six different predictive models, to measure the extent of disagreement between the explanations generated by various popular explanation methods. In addition, we carry out an online user study with data scientists to understand how they resolve the aforementioned disagreements. Our results indicate that (1) state-of-the-art explanation methods often disagree in terms of the explanations they output, and (2) machine learning practitioners often employ ad hoc heuristics when resolving such disagreements. These findings suggest that practitioners may be relying on misleading explanations when making consequential decisions. They also underscore the importance of developing principled frameworks for effectively evaluating and comparing explanations output by various explanation techniques.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the "disagreement problem" in Explainable Machine Learning (XAI). Specifically, as various post hoc explanation methods are increasingly applied to explain complex machine - learning models, especially in high - risk scenarios, it becomes crucial to understand whether and when these explanation methods will produce contradictory explanations. However, currently, there is very limited research on how to evaluate and resolve the disagreements between these explanation methods. ### Core problems of the paper 1. **Disagreements between explanation methods**: Explanations generated by different explanation methods for the same model prediction may be inconsistent, which may lead to misleading conclusions. 2. **Handling of disagreements in practical applications**: In practice, how do data scientists handle these disagreements and what heuristic methods do they rely on to solve the problems? ### Solutions To address these problems, the authors carried out the following tasks: 1. **Interviews and surveys**: - Through semi - structured interviews with 25 data scientists, understand the situations of explanation disagreements they encounter in their actual work and define what constitutes an explanation disagreement. 2. **Formalizing the concept of disagreement**: - Propose a novel quantitative framework for formalizing and quantifying the disagreements between explanation methods. Specifically, it includes six measurement indicators: feature agreement, rank agreement, sign agreement, signed rank agreement, rank correlation, and pairwise rank agreement. 3. **Empirical analysis**: - Use four real - world datasets, six advanced post hoc explanation methods, and six different prediction models to conduct extensive empirical analysis and measure the degree of disagreement between different explanation methods. 4. **User study**: - Through an online user study, further understand the actual decision - making processes and strategies of data scientists when facing explanation disagreements. ### Main findings - **Common and serious**: 84% of the respondents said that they often encounter explanation disagreements in their actual work. - **Lack of standardized methods**: 86% of the participants admitted to using informal heuristic methods or being uncertain about how to resolve explanation disagreements. - **Need for better evaluation criteria**: The research results emphasize the importance of developing systematic evaluation and comparison of explanation methods to avoid misleading explanations affecting critical decisions. ### Conclusion This study reveals that the disagreement problem in explainable machine learning is widespread, and current data scientists mainly rely on informal methods to handle these disagreements. This highlights the need to develop more effective evaluation and comparison of explanation methods to ensure the consistency and reliability of explanations, thereby supporting more reliable decision - making.

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

Dissenting Explanations: Leveraging Disagreement to Reduce Model Overreliance

Disagreement amongst counterfactual explanations: How transparency can be deceptive

Fighting the disagreement in Explainable Machine Learning with consensus

Manipulation Risks in Explainable AI: The Implications of the Disagreement Problem

Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective

Explainable machine learning in deployment

Stop ordering machine learning algorithms by their explainability! A user-centered investigation of performance and explainability

Explainable News Summarization -- Analysis and mitigation of Disagreement Problem

How do ML practitioners perceive explainability? an interview study of practices and challenges

Unified Explanations in Machine Learning Models: A Perturbation Approach

Rethinking Explainability as a Dialogue: A Practitioner's Perspective

"Why Should You Trust My Explanation?" Understanding Uncertainty in LIME Explanations

Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations

Clarity in complexity: how aggregating explanations resolves the disagreement problem

Understanding Disparities in Post Hoc Machine Learning Explanation

Explainability Is in the Mind of the Beholder: Establishing the Foundations of Explainable Artificial Intelligence

Altruist: Argumentative Explanations through Local Interpretations of Predictive Models

Explainable machine learning for public policy: Use cases, gaps, and research directions

Abduction and Argumentation for Explainable Machine Learning: A Position Survey

Interpretable and explainable machine learning: A methods‐centric overview with concrete examples