Adaptive Multi-scale Iterative Optimized Video Object Segmentation Based on Correlation Enhancement

Xuejun Li,Yuan Zong,Wenming Zheng
DOI: https://doi.org/10.1109/tcsvt.2024.3445717
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Semi-supervised video object segmentation (VOS) is a highly challenging task, which relies on the initial frame’s mask as segmentation reference in a video sequence to classify each pixel in subsequent frames. However, the guidance provided by the first frame is limited due to the diverse types of segmentation targets and uncertain appearance changes. Consequently, it is crucial to retain useful information during the segmentation process and employ this information for model iteration optimization, enabling the model to better adapt to rapidly changing segmentation objectives. In this work, we propose a multi-scale adaptive model optimization strategy, which incorporates a contextual relevance enhancement module to enforce the object correlation by emphasizing feature similarity across adjacent frames. Additionally, we introduce a keyframe discrimination module to deal with the segmentation challenges in scenarios involving significant target changes. Moreover, we also introduce a multi-scale memory screening module to automatically screen and select global-local optimization features for ensuring the model’s generalization performance. Extensive experiments show that the proposed method achieves state-of-the-art performance on DAVIS and large-scale Youtube-VOS 2018/2019 datasets without relying on synthetic training data or first-frame fine-tuning.
What problem does this paper attempt to address?