Abstract:Inspired by the human vision attention mechanism, the human vision system uses multilevel features to extract accurate visual saliency information, so multilevel features are important for saliency detection. On the basis of the numerous biological frameworks for visual information processing, we find that better combination and use of multilevel features with time information can greatly improve the accuracy of the video saliency model. The proposed TSFP-Net has the advantages of much higher prediction precision, simple structure, second smallest size, and the third fastest running time compared to the state-of-the-art methods. The encoder extracts multiscale temporal-spatial features from the input continuous video frames and then constructs a temporal-spatial feature pyramid through temporal-spatial convolution and top-down feature integration. The decoder performs hierarchical decoding of temporal-spatial features from different scales and finally produces a saliency map from the integration of multiple video frames. Our model is simple yet effective and can run in real time. We perform abundant experiments, and the results indicate that the well-designed structure can significantly improve the precision of video saliency detection. Experimental results on three purely visual video saliency benchmarks demonstrate that our method outperforms the existing state-of-the-art methods.

Nonlocal Feature Transform with Iterative Refinement for Video Saliency Detection

Video Saliency Detection Using Motion Saliency Filter

Region-based Saliency Estimation for 3D Shape Analysis and Understanding

Color and motion information fusion based saliency filter for video

Saliency Detection Via Local Single Gaussian Model

Salient Region Detection Algorithm Via KL Divergence and Multi-scale Merging

Efficient Classification Using Salient Regions

Video Saliency Detection Using Dynamic Fusion of Spatial-Temporal Features in Complex Background with Disturbance

Video saliency detection based on robust seeds generation and spatio-temporal propagation

Predictive Video Saliency Detection.

Consistent Video Saliency Using Local Gradient Flow Optimization and Global Refinement

Human Vision Attention Mechanism-Inspired Temporal-Spatial Feature Pyramid for Video Saliency Detection

Video Saliency Detection Algorithm Based on Motion Spectral Residual

Salient region detection : Integrate both global and local cues

Saliency Detection Based on Multiple-Level Feature Learning

A biologically inspired computational model for image saliency detection.

Co-Saliency Detection Via Local Prediction and Global Refinement.

Global feature integration based salient region detection

Video Saliency Detection via Dynamic Consistent Spatio-Temporal Attention Modelling.

Video Saliency Detection Using Object Proposals

Visual Saliency Detection: an Information Theoretic Algorithm Combined Long-term with Short-term Features