What Does Evaluation of Explainable Artificial Intelligence Actually Tell Us? A Case for Compositional and Contextual Validation of XAI Building Blocks

Kacper Sokol,Julia E. Vogt
DOI: https://doi.org/10.1145/3613905.3651047
2024-03-19
Abstract:Despite significant progress, evaluation of explainable artificial intelligence remains elusive and challenging. In this paper we propose a fine-grained validation framework that is not overly reliant on any one facet of these sociotechnical systems, and that recognises their inherent modular structure: technical building blocks, user-facing explanatory artefacts and social communication protocols. While we concur that user studies are invaluable in assessing the quality and effectiveness of explanation presentation and delivery strategies from the explainees' perspective in a particular deployment context, the underlying explanation generation mechanisms require a separate, predominantly algorithmic validation strategy that accounts for the technical and human-centred desiderata of their (numerical) outputs. Such a comprehensive sociotechnical utility-based evaluation framework could allow to systematically reason about the properties and downstream influence of different building blocks from which explainable artificial intelligence systems are composed -- accounting for a diverse range of their engineering and social aspects -- in view of the anticipated use case.
Human-Computer Interaction,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the issues present in the evaluation of Explainable Artificial Intelligence (XAI). Specifically, despite significant progress in the field of XAI, its evaluation methods remain elusive and challenging. The authors propose a fine-grained validation framework that does not overly rely on any single aspect of these socio-technical systems and recognizes their inherent modular structure: technical building blocks, user-facing explanation artifacts, and social communication protocols. The paper emphasizes the value of user studies in assessing the effectiveness of explanation presentation strategies but also points out that explanation generation mechanisms require separate, primarily algorithm-based validation strategies to account for the technicality of their outputs and human-centered needs. Through this comprehensive socio-technical utility evaluation framework, it is possible to systematically analyze the properties of different building blocks that constitute explainable artificial intelligence systems and their downstream impacts, considering their intended application scenarios. Therefore, the main objective of the paper is to propose a new evaluation paradigm that can better understand the complex relationship between the technical and social aspects of XAI systems and provide guidance for tailoring and evaluating XAI technologies for specific application scenarios.