Does Faithfulness Conflict with Plausibility? An Empirical Study in Explainable AI across NLP Tasks

Xiaolei Lu,Jianghong Ma

2024-03-30

Abstract:Explainability algorithms aimed at interpreting decision-making AI systems usually consider balancing two critical dimensions: 1) \textit{faithfulness}, where explanations accurately reflect the model's inference process. 2) \textit{plausibility}, where explanations are consistent with domain experts. However, the question arises: do faithfulness and plausibility inherently conflict? In this study, through a comprehensive quantitative comparison between the explanations from the selected explainability methods and expert-level interpretations across three NLP tasks: sentiment analysis, intent detection, and topic labeling, we demonstrate that traditional perturbation-based methods Shapley value and LIME could attain greater faithfulness and plausibility. Our findings suggest that rather than optimizing for one dimension at the expense of the other, we could seek to optimize explainability algorithms with dual objectives to achieve high levels of accuracy and user accessibility in their explanations.

Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: When explaining artificial intelligence decision - making systems, are there essentially conflicts between the faithfulness and plausibility of explanations? Specifically, the author explores whether explanations generated by different explanation methods (such as Shapley values, LIME, etc.) can achieve both high faithfulness and high plausibility in natural language processing tasks. By comparing the consistency of these explanation methods with expert - level explanations, the paper evaluates their performance on three NLP tasks: sentiment analysis, intention detection, and topic tagging, aiming to explore the possibility of optimizing explanation algorithms to achieve both. The key to the paper is to verify the following hypotheses: 1. **Faithfulness**: The explanation can accurately reflect the model's reasoning process. 2. **Plausibility**: The explanation conforms to the cognitive logic of domain experts. Through empirical research, the author finds that traditional perturbation - based methods (such as Shapley values and LIME) can achieve relatively high faithfulness and plausibility simultaneously on multiple NLP tasks, thus challenging the previous view that there is a trade - off between the two. This indicates that by optimizing the explanation algorithm, the performance of one dimension can be improved without sacrificing the other.

Does Faithfulness Conflict with Plausibility? An Empirical Study in Explainable AI across NLP Tasks

Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI

Towards Faithful Model Explanation in NLP: A Survey

Faithfulness and the Notion of Adversarial Sensitivity in NLP Explanations

Faithfulness Tests for Natural Language Explanations

New Faithfulness-Centric Interpretability Paradigms for Natural Language Processing

Evaluating Human Alignment and Model Faithfulness of LLM Rationale

Explainability of Automated Fact Verification Systems: A Comprehensive Review

The XAI Alignment Problem: Rethinking How Should We Evaluate Human-Centered AI Explainability Techniques

FaithLM: Towards Faithful Explanations for Large Language Models

Disagreement amongst counterfactual explanations: How transparency can be deceptive

On Measuring Faithfulness or Self-consistency of Natural Language Explanations

Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models

Why is plausibility surprisingly problematic as an XAI criterion?

Is Ignorance Bliss? The Role of Post Hoc Explanation Faithfulness and Alignment in Model Trust in Laypeople and Domain Experts

On the Interplay between Fairness and Explainability

Exploring Effectiveness of Explanations for Appropriate Trust: Lessons from Cognitive Psychology

Altruist: Argumentative Explanations through Local Interpretations of Predictive Models

Flexible and Context-Specific AI Explainability: A Multidisciplinary Approach