Motion Context Guided Edge-preserving Network for Video Salient Object Detection

Kan Huang,Chunwei Tian,Zhijing Xu,Nannan Li,Jerry Chun-Wei Lin
DOI: https://doi.org/10.1016/j.eswa.2023.120739
IF: 8.5
2023-01-01
Expert Systems with Applications
Abstract:Video salient object detection targets at extracting the most conspicuous objects in a video sequence, which facilitate various video processing tasks, e.g., video compression, video recognition, etc. Although remarkable progress has been made for video salient object detection, most existing methods still suffer from coarse edge boundaries which may hinder their usage in real-world applications. To alleviate this problem, in this paper, we propose a Motion Context guided Edge-preserving network (MCE-Net) model for video salient object detection. MCE-Net can generate temporally consistent salient edges, which are then leveraged to refine the salient object regions completely and uniformly. The core innovation in MCE-Net is an Asymmetric Cross-Reference Module (ACRM), which is designed to exploit the cross-modal complementarity between spatial structure and motion flow, facilitating robust salient object edge extraction. To leverage the extracted edge features for salient object refinement, we fuse them with multi-level spatial–temporal embeddings in a paralleled guidance manner, generating the final saliency results. The proposed method is end-to-end trainable and the edge annotations are generated automatically from ground truth saliency maps. Experimental evaluations on five widely-used benchmarks demonstrate that our proposed method can achieve superior performance to other outstanding methods. Moreover, the experimental results indicate that our method can preserve salient objects with clear boundary structures in video sequences.
What problem does this paper attempt to address?