Abstract:The rationale behind a deep learning model's output is often difficult to understand by humans. EXplainable AI (XAI) aims at solving this by developing methods that improve interpretability and explainability of machine learning models. Reliable evaluation metrics are needed to assess and compare different XAI methods. We propose a novel evaluation approach for benchmarking state-of-the-art XAI attribution methods. Our proposal consists of a synthetic classification model accompanied by its derived ground truth explanations allowing high precision representation of input nodes contributions. We also propose new high-fidelity metrics to quantify the difference between explanations of the investigated XAI method and those derived from the synthetic model. Our metrics allow assessment of explanations in terms of precision and recall separately. Also, we propose metrics to independently evaluate negative or positive contributions of inputs. Our proposal provides deeper insights into XAI methods output. We investigate our proposal by constructing a synthetic convolutional image classification model and benchmarking several widely used XAI attribution methods using our evaluation approach. We compare our results with established prior XAI evaluation metrics. By deriving the ground truth directly from the constructed model in our method, we ensure the absence of bias, e.g., subjective either based on the training set. Our experimental results provide novel insights into the performance of Guided-Backprop and Smoothgrad XAI methods that are widely in use. Both have good precision and recall scores among positively contributing pixels (0.7, 0.76 and 0.7, 0.77, respectively), but poor precision scores among negatively contributing pixels (0.44, 0.61 and 0.47, 0.75, resp.). The recall scores in the latter case remain close. We show that our metrics are among the fastest in terms of execution time.

How does this interaction affect me? Interpretable attribution for feature interactions

Asymmetric feature interaction for interpreting model predictions

Interpretable Artificial Intelligence through the Lens of Feature Interaction

Harmonizing Feature Attributions Across Deep Learning Architectures: Enhancing Interpretability and Consistency

Enhancing Feature Selection and Interpretability in AI Regression Tasks Through Feature Attribution

Quantifying and Visualizing Attribute Interactions

Disentangling Interactions and Dependencies in Feature Attribution

Distributing Synergy Functions: Unifying Game-Theoretic Interaction Methods for Machine-Learning Explainability

Enhancing Model Interpretability with Local Attribution over Global Exploration

Feature Interactions Reveal Linguistic Structure in Language Models

The Weighted Möbius Score: A Unified Framework for Feature Attribution

Interaction as Explanation: A User Interaction-based Method for Explaining Image Classification Models

Interpreting Classifiers through Attribute Interactions in Datasets

Discriminative Feature Attributions: Bridging Post Hoc Explainability and Inherent Interpretability

Error-controlled non-additive interaction discovery in machine learning models

Impossibility Theorems for Feature Attribution

Model Interpretation and Explainability: Towards Creating Transparency in Prediction Models

Prospector Heads: Generalized Feature Attribution for Large Models & Data

Precise Benchmarking of Explainable AI Attribution Methods

A Survey of the Interpretability Aspect of Deep Learning Models

Provably Better Explanations with Optimized Aggregation of Feature Attributions