Abstract:This paper advocates a novel video saliency detection method based on the spatial-temporal saliency fusion and low-rank coherency guided saliency diffusion. In sharp contrast to the conventional methods, which conduct saliency detection locally in a frame-by-frame way and could easily give rise to incorrect low-level saliency map, in order to overcome the existing difficulties, this paper proposes to fuse the color saliency based on global motion clues in a batch-wise fashion. And we also propose low-rank coherency guided spatial-temporal saliency diffusion to guarantee the temporal smoothness of saliency maps. Meanwhile, a series of saliency boosting strategies are designed to further improve the saliency accuracy. First, the original long-term video sequence is equally segmented into many short-term frame batches, and the motion clues of the individual video batch are integrated and diffused temporally to facilitate the computation of color saliency. Then, based on the obtained saliency clues, inter-batch saliency priors are modeled to guide the low-level saliency fusion. After that, both the raw color information and the fused low-level saliency are regarded as the low-rank coherency clues, which are employed to guide the spatial-temporal saliency diffusion with the help of an additional permutation matrix serving as the alternative rank selection strategy. Thus, it could guarantee the robustness of the saliency map's temporal consistence, and further boost the accuracy of the computed saliency map. Moreover, we conduct extensive experiments on five public available benchmarks, and make comprehensive, quantitative evaluations between our method and 16 state-of-the-art techniques. All the results demonstrate the superiority of our method in accuracy, reliability, robustness, and versatility.

Learning Coupled Convolutional Networks Fusion for Video Saliency Prediction

Multi-Scale Spatiotemporal Feature Fusion Network for Video Saliency Prediction

Deep Fusion Based Video Saliency Detection

Video Saliency Detection Using Deep Convolutional Neural Networks.

Deep Audio-Visual Fusion Neural Network for Saliency Estimation.

Video Saliency Detection Via Spatial-Temporal Fusion and Low-Rank Coherency Diffusion

SG-FCN: A Motion and Memory-Based Deep Learning Model for Video Saliency Detection

GFNet: Gated Fusion Network for Video Saliency Prediction

Video Saliency Estimation via Encoding Deep Spatiotemporal Saliency Cues

Progressively Real-Time Video Salient Object Detection Via Cascaded Fully Convolutional Networks with Motion Attention.

Cross-Modality Fusion and Progressive Integration Network for Saliency Prediction on Stereoscopic 3D Images

Video Saliency Prediction Based on Spatial-Temporal Two-Stream Network

DCFusion: Dual-Headed Fusion Strategy and Contextual Information Awareness for Infrared and Visible Remote Sensing Image

A Deep-Learning Based Feature Hybrid Framework for Spatiotemporal Saliency Detection Inside Videos

Saliency Fusion Via Sparse and Double Low Rank Decomposition

Audiovisual Saliency Prediction Via Deep Learning

Video Saliency Prediction Via Joint Discrimination and Local Consistency

DeepVS: A Deep Learning Based Video Saliency Prediction Approach

A Spatial-Frequency-temporal Domain Based Saliency Model for Low Contrast Video Sequences

Assessment of Feature Fusion Strategies in Visual Attention Mechanism for Saliency Detection.

Video Saliency Prediction using Spatiotemporal Residual Attentive Networks.