Abstract:This paper advocates a novel learning solution to the modeling of long-term spatial-temporal saliency consistency in order to boost the accuracy for video saliency detection. Conventional methods typically utilize the “slack” spatial-temporal model to locally ensure the smoothness of the computed video saliency, yet they could easily encounter the performance tradeoff dilemma (i.e., detection’ accuracy and integrity). In contrast, our novel approach proposes the bilevel learning strategy to globally exploit the saliency consistency while overcoming the aforementioned difficulty. Our method first starts with the contrast computation of low-level saliency clues in a frame-wise manner. Then, based on such obtained saliency clues, we devise a novel bilevel Markov Random Field (bMRF) solution to conduct semantic labelling, which can explicitly indicates both the salient salient foregrounds and nonsalient nearby surroundings with high confidence while shrinking the low confidence remains. In such a way, the spatial-temporal consistency constraint is embedded intrinsically into the above explicit semantic labels, and we prevent the performance tradeoff problem from occurring. Next, based on those semantic labels made by our bMRF method, we further propose learnng multiple nonlinear feature transformations to enlarge the feature margin between the salient foregrounds and the non-salient nearby surroundings, whose key rationale is to resort to long-term common consistencies to enforce the spatial-temporal smoothness. Thus, we can utilize these learned non-linear feature transformations to simultaneously suppress those short-term false-alarms and correct those hollow effects. To validate our new approach, we conduct extensive experiments on five publicly available benchmarks, and make comprehensive, quantitative evaluations between our method and 17 state-of-the-art techniques. All of the results demonstrate our method's advantages in terms of accuracy, reliability, robustness, and versatility.

Weighted Linear Multiple Kernel Learning for Saliency Detection

Learning Adaptive Contrast Combinations for Visual Saliency Detection

Saliency Detection Via Local Single Gaussian Model

Salient Object Detection Via Multiple Saliency Weights

Contrast-weighted Dictionary Learning Based Saliency Detection for Remote Sensing Images

Visual Saliency Detection via Contrast Combinations

Nonparametric saliency detection using kernel density estimation

A Saliency Detection Approach to Combine LSK and Color for Color Image

Contrast-weighted dictionary learning based saliency detection for VHR optical remote sensing images

Co-Saliency Detection Via a Self-Paced Multiple-Instance Learning Framework

Saliency Detection Based on Multiple-Level Feature Learning

Saliency Detection Based on Multi-Instance Images Learning

Modern Learning Methodologies for Co-Saliency Detection

Co-saliency Detection Linearly Combining Single-View Saliency and Foreground Correspondence

Optimal contrast based saliency detection

Bilevel Feature Learning for Video Saliency Detection

Saliency Detection Based on Integration of Central Bias, Reweighting and Multi-Scale for Superpixels

Unsupervised Multi-Subclass Saliency Classification for Salient Object Detection

Salient Object Detection Using Window Mask Transferring With Multi-Layer Background Contrast

A New Bottom-Up Method for Saliency Detection

Saliency Detection from Joint Embedding of Spatial and Color Cues