Abstract:By highlighting important features that contribute to model prediction, visual saliency is used as a natural form to interpret the working mechanism of deep neural networks. Numerous methods have been proposed to achieve better saliency results. However, we find that previous visual saliency methods are not reliable enough to provide meaningful interpretation through a simple sanity check: saliency methods are required to explain the output of non-maximum prediction classes, which are usually not ground-truth classes. For example, let the methods interpret an image of "dog" given a wrong class label "fish" as the query. This procedure can test whether these methods reliably interpret model's predictions based on existing features that appear in the data. Our experiments show that previous methods failed to pass the test by generating similar saliency maps or scattered patterns. This false saliency response can be dangerous in certain scenarios, such as medical diagnosis. We find that these failure cases are mainly due to the attribution vanishing and adversarial noise within these methods. In order to learn reliable visual saliency, we propose a simple method that requires the output of the model to be close to the original output while learning an explanatory saliency mask. To enhance the smoothness of the optimized saliency masks, we then propose a simple Hierarchical Attribution Fusion (HAF) technique. In order to fully evaluate the reliability of visual saliency methods, we propose a new task Disturbed Weakly Supervised Object Localization (D-WSOL) to measure whether these methods can correctly attribute the model's output to existing features. Experiments show that previous methods fail to meet this standard, and our approach helps to improve the reliability by suppressing false saliency responses. After observing a significant layout difference in saliency masks between real and adversarial samples. we propose to train a simple CNN on these learned hierarchical attribution masks to distinguish adversarial samples. Experiments show that our method can improve detection performance over other approaches significantly.

Ada-Sal Network: Emulate the Human Visual System

Neural Network With Saliency Based Feature Selection Ability

Improve Neural Network Using Saliency.

PUB-SalNet: A Pre-Trained Unsupervised Self-Aware Backpropagation Network for Biomedical Salient Segmentation

CSA-Net: Deep Cross-Complementary Self Attention and Modality-Specific Preservation for Saliency Detection

Accurate salient object detection via dense recurrent connections and residual-based hierarchical feature integration.

Salient Object Detection Based on Visual Perceptual Saturation and Two-Stream Hybrid Networks.

A Deep Spatial Contextual Long-term Recurrent Convolutional Network for Saliency Detection

Saliency Detection With a Three-Stage Hierarchical Network

AWANet: Attentive-Aware Wide-Kernels Asymmetrical Network with Blended Contour Information for Salient Object Detection

Deep supervised visual saliency model addressing low-level features

Learning Reliable Visual Saliency for Model Explanations

Spatio-Temporal Self-Attention Network for Video Saliency Prediction

Saliency Detection by Forward and Backward Cues in Deep-Cnn

Multi-Color Space Network for Salient Object Detection

Contextual Encoder-Decoder Network for Visual Saliency Prediction

An End-to-End Network for Co-Saliency Detection in One Single Image

Adaptive Group-wise Consistency Network for Co-saliency Detection

SalNAS: Efficient Saliency-prediction Neural Architecture Search with self-knowledge distillation

SalFAU-Net: Saliency Fusion Attention U-Net for Salient Object Detection

Top-Down Saliency Detection Driven by Visual Classification