Abstract:By highlighting important features that contribute to model prediction, visual saliency is used as a natural form to interpret the working mechanism of deep neural networks. Numerous methods have been proposed to achieve better saliency results. However, we find that previous visual saliency methods are not reliable enough to provide meaningful interpretation through a simple sanity check: saliency methods are required to explain the output of non-maximum prediction classes, which are usually not ground-truth classes. For example, let the methods interpret an image of "dog" given a wrong class label "fish" as the query. This procedure can test whether these methods reliably interpret model's predictions based on existing features that appear in the data. Our experiments show that previous methods failed to pass the test by generating similar saliency maps or scattered patterns. This false saliency response can be dangerous in certain scenarios, such as medical diagnosis. We find that these failure cases are mainly due to the attribution vanishing and adversarial noise within these methods. In order to learn reliable visual saliency, we propose a simple method that requires the output of the model to be close to the original output while learning an explanatory saliency mask. To enhance the smoothness of the optimized saliency masks, we then propose a simple Hierarchical Attribution Fusion (HAF) technique. In order to fully evaluate the reliability of visual saliency methods, we propose a new task Disturbed Weakly Supervised Object Localization (D-WSOL) to measure whether these methods can correctly attribute the model's output to existing features. Experiments show that previous methods fail to meet this standard, and our approach helps to improve the reliability by suppressing false saliency responses. After observing a significant layout difference in saliency masks between real and adversarial samples. we propose to train a simple CNN on these learned hierarchical attribution masks to distinguish adversarial samples. Experiments show that our method can improve detection performance over other approaches significantly.

Learning attributions grounded in existing facts for robust visual explanation

Learning Reliable Visual Saliency for Model Explanations

Leveraging saliency priors and explanations for enhanced consistent interpretability

Sim2Word: Explaining Similarity with Representative Attribute Words via Counterfactual Explanations

Graphical Perception of Saliency-based Model Explanations

"Is your explanation stable?": A Robustness Evaluation Framework for Feature Attribution

Are Visual Explanations Useful? A Case Study in Model-in-the-Loop Prediction

Explaining with Counter Visual Attributes and Examples

Provably Better Explanations with Optimized Aggregation of Feature Attributions

Comprehensive Attribution: Inherently Explainable Vision Model with Feature Detector

Accurate Explanation Model for Image Classifiers using Class Association Embedding

Improving Network Interpretability via Explanation Consistency Evaluation

Grounding Visual Explanations

Visualizing Global Explanations of Point Cloud DNNs

Enhancing Model Interpretability with Local Attribution over Global Exploration

Towards Visual Saliency Explanations of Face Verification

Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models

Prob-POS: A Framework for Improving Visual Explanations from Convolutional Neural Networks for Remote Sensing Image Classification

Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract)

Counterfactual-based Saliency Map: Towards Visual Contrastive Explanations for Neural Networks.

Making Heads or Tails: Towards Semantically Consistent Visual Counterfactuals