Abstract:Integrated Gradients (IG), one of the most popular explainability methods available, still remains ambiguous in the selection of baseline, which may seriously impair the credibility of the explanations. This study proposes a new uniform baseline, i.e., the Maximum Entropy Baseline, which is consistent with the "uninformative" property of baselines defined in IG. In addition, we propose an improved ablating evaluation approach incorporating the new baseline, where the information conservativeness is maintained. We explain the linear transformation invariance of IG baselines from an information perspective. Finally, we assess the reliability of the explanations generated by different explainability methods and different IG baselines through extensive evaluation experiments.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are: **the ambiguity in baseline selection and the reliability of interpretation results in the Integrated Gradients (IG) method**. Specifically, the paper points out: 1. **Ambiguity in baseline selection**: - The IG method relies on a baseline \( x' \) to calculate the integrated gradient from the baseline to the input \( x \). However, the selection of the baseline lacks clear criteria, and different baselines may lead to significant differences in interpretation results. - Currently commonly used baselines (such as zero - padding, black/white vectors, random initialization, etc.) are effective in some cases, but lack a unified quantitative standard to measure their "uninformative" nature. 2. **Reliability assessment of interpretation results**: - The lack of ground truth makes it difficult to assess the reliability of interpretation results. - Existing ablation test methods have deficiencies, for example, there are no unified substitute pixels and information conservation cannot be guaranteed. To solve these problems, the paper proposes the following improvements: - **Maximum Entropy Baseline**: - A new baseline selection method, called the maximum entropy baseline, is proposed to ensure that the baseline maintains an "uninformative" nature. - It is represented by the formula: \[ B_{X_{\text{entr}}} = \arg\max_x H(\text{Softmax}(f_l(x))) \] where \( f_l(x) \) represents the logits output of the model, and \( H(\cdot) \) is the entropy function: \[ H(A) = -\sum_{i = 1}^n P(a_i)\log P(a_i) \] - **Improved ablation test**: - A new entropy - based ablation test method is proposed to ensure information conservation and use unified substitute pixels for evaluation. - By monitoring the entropy of logits as a quantitative indicator of the amount of information, it is ensured that the amount of information in the input after ablation is reduced, which conforms to the formula: \[ I(B_{X_{\text{entr}}}) \approx \frac{1}{H(\sigma(f_l(B_{X_{\text{entr}}})))} \] - **Linear transformation invariance**: - It explains why some baselines perform well under uniform linear transformations but fail under non - uniform linear transformations, and analyzes from an information perspective. Through these improvements, the paper aims to improve the reliability and interpretability of the interpretation results of the IG method.

Maximum Entropy Baseline for Integrated Gradients

A New Baseline Assumption of Integated Gradients Based on Shaply value

IDGI: A Framework to Eliminate Explanation Noise from Integrated Gradients

Integrated Decision Gradients: Compute Your Attributions Where the Model Makes Its Decision

Generalized Integrated Gradients: A practical method for explaining diverse ensembles

IG2: Integrated Gradient on Iterative Gradient Path for Feature Attribution

Unlearning-based Neural Interpretations

Learning Intrinsic Dimension via Information Bottleneck for Explainable Aspect-based Sentiment Analysis

Explainability as statistical inference

The Effective coalitions of Shapley value For Integrated Gradients

On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box

Four Axiomatic Characterizations of the Integrated Gradients Attribution Method

Decompose the model: Mechanistic interpretability in image models with Generalized Integrated Gradients (GIG)

Explaining machine learning models using entropic variable projection

Do Input Gradients Highlight Discriminative Features?

T-Explainer: A Model-Agnostic Explainability Framework Based on Gradients

Rethinking the Role of Gradient-Based Attribution Methods for Model Interpretability

Strengthening Interpretability: An Investigative Study of Integrated Gradient Methods

GLEAMS: Bridging the Gap Between Local and Global Explanations

Model Agnostic Multilevel Explanations