CRNet: Collaborative Refinement Network for Self-Supervised Video Object Segmentation

Dexiang Hong,Guorong Li,Bineng Zhong,Zhenjun Han,Li Su,Qingming Huang
DOI: https://doi.org/10.1109/mipr54900.2022.00037
2022-01-01
Abstract:Most self-supervised based methods just rely on a point-to-point correspondence strategy to propagate masks through a video sequence. However, the pixel level matching is not sufficient and often results in noise. To ease the problem, we propose our collaborative refinement network (CRNet) for self-supervised video object segmentation. Our collaborative refinement network consists of two modules, i.e., memory retrieval module and collaborative refinement module. The memory retrieval module is used to perform point-to-point correspondence and produce a propagated mask for a query frame. The collaborative refinement module is designed to aggregate the reference & query information and learn the collaborative relationship among them implicitly to refine the output of the memory retrieval module. The whole model is trained from unlabeled video data without any human annotation in a self-supervised manner. Extensive experiments conducted on DAVIS-17 and YouTube-VOS demonstrate that our CRNet surpasses the state-of-the-art self-supervised methods and narrows the gap with the fully supervised methods.
What problem does this paper attempt to address?