Abstract:The rationale behind a deep learning model's output is often difficult to understand by humans. EXplainable AI (XAI) aims at solving this by developing methods that improve interpretability and explainability of machine learning models. Reliable evaluation metrics are needed to assess and compare different XAI methods. We propose a novel evaluation approach for benchmarking state-of-the-art XAI attribution methods. Our proposal consists of a synthetic classification model accompanied by its derived ground truth explanations allowing high precision representation of input nodes contributions. We also propose new high-fidelity metrics to quantify the difference between explanations of the investigated XAI method and those derived from the synthetic model. Our metrics allow assessment of explanations in terms of precision and recall separately. Also, we propose metrics to independently evaluate negative or positive contributions of inputs. Our proposal provides deeper insights into XAI methods output. We investigate our proposal by constructing a synthetic convolutional image classification model and benchmarking several widely used XAI attribution methods using our evaluation approach. We compare our results with established prior XAI evaluation metrics. By deriving the ground truth directly from the constructed model in our method, we ensure the absence of bias, e.g., subjective either based on the training set. Our experimental results provide novel insights into the performance of Guided-Backprop and Smoothgrad XAI methods that are widely in use. Both have good precision and recall scores among positively contributing pixels (0.7, 0.76 and 0.7, 0.77, respectively), but poor precision scores among negatively contributing pixels (0.44, 0.61 and 0.47, 0.75, resp.). The recall scores in the latter case remain close. We show that our metrics are among the fastest in terms of execution time.

Improving performance of deep learning models with axiomatic attribution priors and expected gradients

Provably Better Explanations with Optimized Aggregation of Feature Attributions

Fourier-transform-based attribution priors improve the interpretability and stability of deep learning models for genomics

Four Axiomatic Characterizations of the Integrated Gradients Attribution Method

T-Explainer: A Model-Agnostic Explainability Framework Based on Gradients

Principled feature attribution for unsupervised gene expression analysis

Better Understanding Differences in Attribution Methods via Systematic Evaluations

Precise Benchmarking of Explainable AI Attribution Methods

MFABA: A More Faithful and Accelerated Boundary-based Attribution Method for Deep Neural Networks

Greedy PIG: Adaptive Integrated Gradients

Advancing Attribution-Based Neural Network Explainability through Relative Absolute Magnitude Layer-Wise Relevance Propagation and Multi-Component Evaluation

Integrated Decision Gradients: Compute Your Attributions Where the Model Makes Its Decision

Visual Interpretable and Explainable Deep Learning Models for Brain Tumor MRI and COVID-19 Chest X-ray Images

Rethinking the Role of Gradient-Based Attribution Methods for Model Interpretability

Shaping Noise for Robust Attributions in Neural Stochastic Differential Equations

Gradient based Feature Attribution in Explainable AI: A Technical Review

Integrated Gradient Correlation: a Dataset-wise Attribution Method

Gradient-based Uncertainty Attribution for Explainable Bayesian Deep Learning

On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box

IG2: Integrated Gradient on Iterative Gradient Path for Feature Attribution