Semantic-Enhanced Point-Box Joint Prompting for Video Object Segmentation

Quan Zhao,Siying Wu,Yueyi Zhang,Xiaoyan Sun
DOI: https://doi.org/10.1109/icip51287.2024.10648107
2024-01-01
Abstract:The Segment Anything Model (SAM) has demonstrated outstanding zero-shot performance in image segmentation through efficient point and box prompts. In this paper, we propose a SAM-based Semantic-enhanced Point-Box joint prompting (SAM-SPB) framework for Video Object Segmentation (VOS). SAM-SPB leverages the local structure information and the global semantic cues of interest objects, leading to strong and robust segmentation. To be specific, the local structure information of the objects is maintained by a point tracking branch, and the semantic consistency of the objects across frames are propagated through our proposed semantic-aware memory-based box tracking branch. Compared with previous SAM-based point-centric video segmentation method, we highlight the importance of point-box joint prompting for video object segmentation. The state-of-the-art experimental results on popular VOS benchmarks in the zero-shot setting demonstrate the strong zero-shot ability of the proposed method.
What problem does this paper attempt to address?