Abstract:As deep vision models' popularity rapidly increases, there is a growing emphasis on explanations for model predictions. The inherently explainable attribution method aims to enhance the understanding of model behavior by identifying the important regions in images that significantly contribute to predictions. It is achieved by cooperatively training a selector (generating an attribution map to identify important features) and a predictor (making predictions using the identified features). Despite many advancements, existing methods suffer from the incompleteness problem, where discriminative features are masked out, and the interlocking problem, where the non-optimized selector initially selects noise, causing the predictor to fit on this noise and perpetuate the cycle. To address these problems, we introduce a new objective that discourages the presence of discriminative features in the masked-out regions thus enhancing the comprehensiveness of feature selection. A pre-trained detector is introduced to detect discriminative features in the masked-out region. If the selector selects noise instead of discriminative features, the detector can observe and break the interlocking situation by penalizing the selector. Extensive experiments show that our model makes accurate predictions with higher accuracy than the regular black-box model, and produces attribution maps with high feature coverage, localization ability, fidelity and robustness. Our code will be available at \href{<a class="link-external link-https" href="https://github.com/Zood123/COMET" rel="external noopener nofollow">this https URL</a>}{<a class="link-external link-https" href="https://github.com/Zood123/COMET" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on two aspects: **incompleteness problem** and **inter - locking problem**, which exist in the existing explainable deep vision models. ### 1. Incompleteness problem When generating feature attribution maps, existing methods are often unable to comprehensively cover all discriminative features. For example, some explanation techniques (such as B - cos) may only focus on a part of the target object (such as the beak of a duck) and ignore other important parts. This incompleteness is especially crucial in high - risk scenarios (such as medical image diagnosis), because doctors need to rely on attribution maps to identify all abnormal areas (such as tumors) in CT scans to avoid misdiagnosis. ### 2. Inter - locking problem During the process of jointly training the selector and the predictor, the selector may select noise instead of discriminative features, causing the predictor to adapt to this noise and produce predictions with low accuracy. Specifically, a non - optimized selector may initially select noise and block out discriminative features, thus making the predictor adapt to the noise and then fall into a local optimal solution. This situation will lead to an "inter - lock" between the selector and the predictor, that is, the selector tends to select noise to minimize the prediction loss, while the predictor makes predictions based on this noise. ### Solution To solve the above problems, the author proposes a new explainable model named **COMprehensive fEature aTtribution (COMET)**: - **For the incompleteness problem**: A new objective function is introduced to encourage the selector to select as many discriminative features as possible when generating attribution maps. By introducing an additional objective function in the masked area, it is ensured that the selector will not miss any important discriminative features. - **For the inter - locking problem**: A pre - trained feature detector is introduced to detect discriminative features in the masked area. If the selector selects noise instead of discriminative features, the detector can identify these ignored features and penalize the selector, thus breaking the inter - locking cycle between the selector and the predictor. ### Experimental verification Through experiments on multiple datasets, the author has proven that the COMET model is not only superior to other baseline models in classification accuracy, but also the generated attribution maps have higher feature coverage, localization ability, fidelity and robustness. In summary, this paper aims to improve the interpretability and prediction performance of deep vision models by improving the comprehensiveness of feature selection and preventing the inter - locking problem between the selector and the predictor.

Comprehensive Attribution: Inherently Explainable Vision Model with Feature Detector

Exclusive Feature Constrained Class Activation Mapping for Better Visual Explanation.

Deep Model Transferability from Attribution Maps

TVE: Learning Meta-attribution for Transferable Vision Explainer

Learning attributions grounded in existing facts for robust visual explanation

Enhancing Model Interpretability with Local Attribution over Global Exploration

A Comprehensive and Reliable Feature Attribution Method: Double-sided Remove and Reconstruct (DoRaR)

Sim2Word: Explaining Similarity with Representative Attribute Words via Counterfactual Explanations

Benchmarking the Attribution Quality of Vision Models

"Is your explanation stable?": A Robustness Evaluation Framework for Feature Attribution

Understanding contributing neurons via attribution visualization

Discriminative Feature Attributions: Bridging Post Hoc Explainability and Inherent Interpretability

A-FMI: Learning Attributions from Deep Networks via Feature Map Importance

Provably Better Explanations with Optimized Aggregation of Feature Attributions

Prospector Heads: Generalized Feature Attribution for Large Models & Data

Accurate Explanation Model for Image Classifiers using Class Association Embedding

Improving Explainability of Disentangled Representations using Multipath-Attribution Mappings

Explaining Object Detectors via Collective Contribution of Pixels

AttributionScanner: A Visual Analytics System for Model Validation with Metadata-Free Slice Finding