Improving Video Saliency Detection Via Localized Estimation and Spatiotemporal Refinement.

Xiaofei Zhou,Zhi Liu,Chen Gong,Wei Liu
DOI: https://doi.org/10.1109/tmm.2018.2829605
IF: 7.3
2018-01-01
IEEE Transactions on Multimedia
Abstract:Video saliency detection aims to pop out the most salient regions in every frame of a video. Up to now, many efforts have been made from various aspects for video saliency detection. Unfortunately, the existing video saliency models are very likely to fail in challenging videos with complicated motions and complex scenes. Therefore, in this paper, we propose a novel framework to improve the saliency detection results generated by existing video saliency models. The proposed framework consists of three key steps including localized estimation, spatiotemporal refinement, and saliency update. Specifically, the initial saliency map of each frame in a video is first generated by using an existing saliency model. Then, by considering the temporal consistency and strong correlation among adjacent frames, the localized estimation models, which are generated by training the random forest regressor within a local temporal window, are employed to generate the temporary saliency map. Finally, by taking the appearance and motion information of salient objects into consideration, the spatiotemporal refinement step is deployed to further improve the temporary saliency map and generate the final saliency map. Furthermore, such an improved saliency map is then utilized to update the initial saliency map and provide reliable cues for saliency detection in the next frame. The experimental results on four challenging datasets demonstrate that the proposed framework is able to consistently and significantly improve the saliency detection performance of various video saliency models, thereby achieving the state-of-the-art performance.
What problem does this paper attempt to address?