Bilevel Feature Learning for Video Saliency Detection
Chenglizhao Chen,Shuai Li,Hong Qin,Zhenkuan Pan,Guowei Yang
DOI: https://doi.org/10.1109/tmm.2018.2839523
IF: 7.3
2018-01-01
IEEE Transactions on Multimedia
Abstract:This paper advocates a novel learning solution to the modeling of long-term spatial-temporal saliency consistency in order to boost the accuracy for video saliency detection. Conventional methods typically utilize the “slack” spatial-temporal model to locally ensure the smoothness of the computed video saliency, yet they could easily encounter the performance tradeoff dilemma (i.e., detection’ accuracy and integrity). In contrast, our novel approach proposes the bilevel learning strategy to globally exploit the saliency consistency while overcoming the aforementioned difficulty. Our method first starts with the contrast computation of low-level saliency clues in a frame-wise manner. Then, based on such obtained saliency clues, we devise a novel bilevel Markov Random Field (bMRF) solution to conduct semantic labelling, which can explicitly indicates both the salient salient foregrounds and nonsalient nearby surroundings with high confidence while shrinking the low confidence remains. In such a way, the spatial-temporal consistency constraint is embedded intrinsically into the above explicit semantic labels, and we prevent the performance tradeoff problem from occurring. Next, based on those semantic labels made by our bMRF method, we further propose learnng multiple nonlinear feature transformations to enlarge the feature margin between the salient foregrounds and the non-salient nearby surroundings, whose key rationale is to resort to long-term common consistencies to enforce the spatial-temporal smoothness. Thus, we can utilize these learned non-linear feature transformations to simultaneously suppress those short-term false-alarms and correct those hollow effects. To validate our new approach, we conduct extensive experiments on five publicly available benchmarks, and make comprehensive, quantitative evaluations between our method and 17 state-of-the-art techniques. All of the results demonstrate our method's advantages in terms of accuracy, reliability, robustness, and versatility.