Learning Memory Propagation And Matching For Semi-Supervised Video Object Segmentation

Jiale Wang,Hongli Xu,Kexuan Fan,Hui Yin,Longfei Xia
DOI: https://doi.org/10.21203/rs.3.rs-1218266/v1
2022-01-01
Abstract:Abstract This paper studies the task of semi-supervised video object segmentation (VOS). Multiple works have shown the outstanding performance of the memory retrieval method based on matching, which performs temporal and spatial pixel-level matching, but does not pay attention to the temporal relationship of the frames. To this end, we propose a memory propagation and matching (MPM) method, combining the propagation-based method and matching-based method simultaneously, to reduce some wrong matching and maintain the consistency between adjacent frames and make the model more robust to occlusion and object disappearance and reproduction. Inspired by the remarkable effect of recurrent neural network (RNN) based methods in video tasks, we proposed memory propagation (MP) module which uses Convolution Gate Recurrent Unit (ConvGRU) for memory propagation, and the memory refinement is carried out when the target frame is segmented. At the same time, MPM matches the target frame with the first frame and the previous adjacent frame. The multi-object matching (MOM) module calculates the probability matrix of each pixel belonging to each object, so that the MPM model can effectively distinguish different objects. Experiments show that the MPM model has achieved J&F 82.8% on DAVIS 2017 Validation dataset and J&F 80.1% on YouTube-VOS dataset.
What problem does this paper attempt to address?