Self-supervised Video Object Segmentation Using Motion Feature Compensation

Tianqi Zhang,Bo Li
DOI: https://doi.org/10.1007/978-3-031-44195-0_41
2023-01-01
Abstract:Video object segmentation is a popular area of research in computer vision. Traditional models are trained using annotated data, which is both time-consuming and expensive. Training models in unsupervised manner has been proposed as a solution to this issue. However, previous works have focused only on spatial features extracted by self-supervised learning method, without considering the temporal information between frames. In this paper, we propose a new video object segmentation model that utilizes self-supervised learning to extract spatial features, and incorporates a motion feature, extracted from optical flow, as compensation of temporal information for the model, namely motion feature compensation (MFC) model. Additionally, we introduce an attention-based fusion method to merge features from both modalities. Notably, for each video used to train models, we only select two consecutive frames at random to train our model. The dataset Youtube-VOS and DAVIS-2017 are adopted as the training dataset and the validation dataset. The experimental results demonstrate that our approach outperforms previous methods, validating our proposed design. The source code is available at: https://github.com/CVisionProcessing/MFC.
What problem does this paper attempt to address?