Does Faithfulness Conflict with Plausibility? An Empirical Study in Explainable AI across NLP Tasks

Xiaolei Lu,Jianghong Ma
2024-03-30
Abstract:Explainability algorithms aimed at interpreting decision-making AI systems usually consider balancing two critical dimensions: 1) \textit{faithfulness}, where explanations accurately reflect the model's inference process. 2) \textit{plausibility}, where explanations are consistent with domain experts. However, the question arises: do faithfulness and plausibility inherently conflict? In this study, through a comprehensive quantitative comparison between the explanations from the selected explainability methods and expert-level interpretations across three NLP tasks: sentiment analysis, intent detection, and topic labeling, we demonstrate that traditional perturbation-based methods Shapley value and LIME could attain greater faithfulness and plausibility. Our findings suggest that rather than optimizing for one dimension at the expense of the other, we could seek to optimize explainability algorithms with dual objectives to achieve high levels of accuracy and user accessibility in their explanations.
Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: When explaining artificial intelligence decision - making systems, are there essentially conflicts between the faithfulness and plausibility of explanations? Specifically, the author explores whether explanations generated by different explanation methods (such as Shapley values, LIME, etc.) can achieve both high faithfulness and high plausibility in natural language processing tasks. By comparing the consistency of these explanation methods with expert - level explanations, the paper evaluates their performance on three NLP tasks: sentiment analysis, intention detection, and topic tagging, aiming to explore the possibility of optimizing explanation algorithms to achieve both. The key to the paper is to verify the following hypotheses: 1. **Faithfulness**: The explanation can accurately reflect the model's reasoning process. 2. **Plausibility**: The explanation conforms to the cognitive logic of domain experts. Through empirical research, the author finds that traditional perturbation - based methods (such as Shapley values and LIME) can achieve relatively high faithfulness and plausibility simultaneously on multiple NLP tasks, thus challenging the previous view that there is a trade - off between the two. This indicates that by optimizing the explanation algorithm, the performance of one dimension can be improved without sacrificing the other.