Abstract:Saliency detection has been increasingly gaining research interest in recent years since many computer vision applications need to derive object attentions from images in the first steps. Multi-scale awareness of the saliency detector becomes essential to find thin and small attention regions as well as keeping high-level semantics. In this paper, we propose a novel holistic and deep feature pyramid neural network architecture that can leverage multi-scale semantics in feature encoding stage and saliency region prediction (decoding) stage. In the encoding stage, we exploit multi-scale and pyramidal hierarchy of feature maps via the densely connected network with variable-size dilated convolutions as well as a pyramid pooling. In the decoding stage, we fuse multi-level feature maps via up-sampling and convolution. In addition, we utilize the multi-level deep supervision via plugging in loss functions at every feature fusion level. Multi-loss supervision regularizes weights searching space among different tasks minimizing overfitting and enhances gradient signal during backpropagation, and thus enables us training the network from scratch. This architecture builds an inherent multi-level semantic pyramidal feature maps at different scales and enhances model’s capability in the saliency detection task. We validated our approach on six benchmark datasets and compared with Corresponding authors: Zhifan Gao (gaozhifan@gmail.com) and Heye Zhang (hy.zhang@siat.ac.cn) The National Natural Science Foundation of China (No: 61525106, 61427807,61771464), shenzhen innovation funding (JCYJ20170307165309009, JCYJ20170413114916687,SGLH20161212104605195) c © 2018. The copyright of this document resides with its authors. 2 29TH BRITISH MACHINE VISION CONFERENCE: BMVC2018 eleven state-of-the-art methods. The results demonstrated that the design effectiveness and our approach outperformed the compared methods.

From Seed Discovery to Deep Reconstruction

Learning Stereoscopic Visual Attention Model for 3d Video

Saliency In Crowd

Revisiting Video Saliency Prediction in the Deep Learning Era

Holistic and Deep Feature Pyramids for Saliency Detection.

Two-Stage Learning to Predict Human Eye Fixations Via SDAEs

What Do Deep Saliency Models Learn about Visual Attention?

How Drones Look: Crowdsourced Knowledge Transfer for Aerial Video Saliency Prediction.

Revisiting Video Saliency: A Large-scale Benchmark and a New Model

Video Saliency Prediction using Spatiotemporal Residual Attentive Networks.

Enriched Feature Representation and Combination for Deep Saliency Detection

Unsupervised Discovery of Crowd Activities by Saliency-Based Clustering

Deep Learning for Video Saliency Detection.

Spatio-Temporal Self-Attention Network for Video Saliency Prediction

Deep supervised visual saliency model addressing low-level features

Video Salient Object Detection via Fully Convolutional Networks

DMRA: Depth-Induced Multi-Scale Recurrent Attention Network for RGB-D Saliency Detection

Video Saliency Detection via Dynamic Consistent Spatio-Temporal Attention Modelling.

Background Prior-Based Salient Object Detection Via Deep Reconstruction Residual.

Human Vision Attention Mechanism-Inspired Temporal-Spatial Feature Pyramid for Video Saliency Detection

Crowd Aware Summarization of Surveillance Videos by Deep Reinforcement Learning