Abstract:There has been a significant surge of interest recently around the concept of explainable artificial intelligence (XAI), where the goal is to produce an interpretation for a decision made by a machine learning algorithm. Of particular interest is the interpretation of how deep neural networks make decisions, given the complexity and `black box' nature of such networks. Given the infancy of the field, there has been very limited exploration into the assessment of the performance of explainability methods, with most evaluations centered around subjective visual interpretation of the produced interpretations. In this study, we explore a more machine-centric strategy for quantifying the performance of explainability methods on deep neural networks via the notion of decision-making impact analysis. We introduce two quantitative performance metrics: i) Impact Score, which assesses the percentage of critical factors with either strong confidence reduction impact or decision changing impact, and ii) Impact Coverage, which assesses the percentage coverage of adversarially impacted factors in the input. A comprehensive analysis using this approach was conducted on several state-of-the-art explainability methods (LIME, SHAP, Expected Gradients, GSInquire) on a ResNet-50 deep convolutional neural network using a subset of ImageNet for the task of image classification. Experimental results show that the critical regions identified by LIME within the tested images had the lowest impact on the decision-making process of the network (~38%), with progressive increase in decision-making impact for SHAP (~44%), Expected Gradients (~51%), and GSInquire (~76%). While by no means perfect, the hope is that the proposed machine-centric strategy helps push the conversation forward towards better metrics for evaluating explainability methods and improve trust in deep neural networks.

Solving the enigma: Deriving optimal explanations of deep networks

Which Neural Network Makes More Explainable Decisions? an Approach Towards Measuring Explainability

Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations

Explaining Explanations: An Overview of Interpretability of Machine Learning

Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications

Do Explanations Reflect Decisions? A Machine-centric Strategy to Quantify the Performance of Explainability Algorithms

Foiling Explanations in Deep Neural Networks

Interpretable Deep Learning Models: Enhancing Transparency and Trustworthiness in Explainable AI

Explainable AI: A review of applications to neuroimaging data

The future of human-centric eXplainable Artificial Intelligence (XAI) is not post-hoc explanations

Explainable AI for Medical Data: Current Methods, Limitations, and Future Directions

Fool Me Once? Contrasting Textual and Visual Explanations in a Clinical Decision-Support Setting

From Attribution Maps to Human-Understandable Explanations through Concept Relevance Propagation

A Theoretical Framework for AI Models Explainability with Application in Biomedicine

Explaining Deep Neural Networks by Leveraging Intrinsic Methods

Explain To Decide: A Human-Centric Review on the Role of Explainable Artificial Intelligence in AI-assisted Decision Making

An explainable three dimension framework to uncover learning patterns: A unified look in variable sulci recognition

Toward explainable artificial intelligence: A survey and overview on their intrinsic properties

T-Explainer: A Model-Agnostic Explainability Framework Based on Gradients

Explanation matters: An experimental study on explainable AI

Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models