A Unified Video Semantics Extraction and Noise Object Suppression Network for Video Saliency Detection

Zhenshan Tan,Xiaodong Gu
DOI: https://doi.org/10.1007/978-3-031-44195-0_28
2023-01-01
Abstract:Video salient object detection (VSOD) aims to segment the most attractive objects from a video sequence. Exploring video semantics and suppressing noise objects are two challenges in the VSOD. In this paper, we propose a unified end-to-end network with video Semantics Extraction and Noise Object suppression (SENO). SENO has two modules, including a video semantics module (VSM) and a contrastive learning module (CLM). VSM extracts video semantics by calculating global pixel correspondences, locating the video salient objects. CLM pulls close video foregrounds and pushes away interference objects, which enhances effective video salient features and suppresses noise objects. CLM is only applied during training, avoiding extra overhead during inference. Besides, our SENO does not use the pre-processing temporal modeling techniques such as optical flow methods, which avoids high computational costs and accumulated inaccuracies caused by these complex models. Experimental results on five benchmark testing datasets show that our SENO outperforms state-of-the-art methods. In addition, the proposed SENO can detect results in real-time.
What problem does this paper attempt to address?