Critical Empirical Study on Black-box Explanations in AI

Jean-Marie John-Mathews
DOI: https://doi.org/10.48550/arXiv.2109.15067
2021-09-29
Abstract:This paper provides empirical concerns about post-hoc explanations of black-box ML models, one of the major trends in AI explainability (XAI), by showing its lack of interpretability and societal consequences. Using a representative consumer panel to test our assumptions, we report three main findings. First, we show that post-hoc explanations of black-box model tend to give partial and biased information on the underlying mechanism of the algorithm and can be subject to manipulation or information withholding by diverting users' attention. Secondly, we show the importance of tested behavioral indicators, in addition to self-reported perceived indicators, to provide a more comprehensive view of the dimensions of interpretability. This paper contributes to shedding new light on the actual theoretical debate between intrinsically transparent AI models and post-hoc explanations of black-box complex models-a debate which is likely to play a highly influential role in the future development and operationalization of AI systems.
Human-Computer Interaction
What problem does this paper attempt to address?
This paper aims to explore the limitations and social impacts of post - hoc explanations of black - box models in artificial intelligence. Specifically, through empirical research, the paper questions the post - hoc explanations of black - box models, pointing out that these explanations often provide partial and biased information and may be used to manipulate or divert users' attention. In addition, the paper emphasizes the need to test behavioral indicators in addition to self - reported perception indicators to more comprehensively assess the interpretability dimension of explanations. These research results are of great significance for understanding the theoretical disputes between transparent models and post - hoc explanations of black - box models, and are helpful for the development and practical application of future trustworthy AI systems.