Object-based spatial similarity for semi-supervised video object segmentation
Bofei Wang,Chengjian Zheng,Ning Wang,Shunfei Wang,Xiaofeng Zhang,Shaoli Liu,Si Gao,Kaidi Lu,Diankai Zhang,Lin Shen,Yukang Wang,Yongchao Xu,J Luiten,P Voigtlaender,B Leibe,M Tran,T Le,TV Nguyen,T Ton,T Hoang,N Bui,T Do,Q Luong,V Nguyen,DA Duong,MN Do,SW Oh,J Lee,N Xu,SJ Kim,A Robinson,FJ Lawin,M Danelljan,M Felsberg,H Guo,W Wang,G Guo,H Li,J Liu,Q He,X Xiao,SW Oh,J Lee,N Xu,SJ Kim,Y Heo,YJ Koh,C Kim,Z Lin,J Xie,C Zhou,J Hu,W Zheng,H Ren,Y Yang,X Liu,IE Zulfikar,J Luiten,B Leibe,Z Yang,Q Wang,S Bai,W Hu,PHS Torr
2019-01-01
Abstract:Video object segmentation (VOS) is a fundamental task in computer vision. In this paper, we present a two-stage semi-supervised VOS method, which aims to perform VOS with the annotations of first-frame. In the first stage, we employ a state-of-the-art instance segmentation approach followed by an improved merging method, which takes Objectbased Spatial Similarity (OSS) into account as well. In this way, preliminary segmentation of video sequences is generated. In the second stage, we propose a novel Adaptive Reference-frame Selection (ARS) algorithm based on OSS, which could reduce the object-ID mismatching under object occlusion and deformation. With ARS, the reliable reference frame for each object can be selected dynamically and adaptively during the tracking and segmentation process. Then, the preliminary segmentation can be further refined based on the corresponding reference frame. We evaluate the proposed algorithm on the DAVIS 2017 Video Object Segmentation Benchmark and achieve first place on the DAVIS 2019 Semi-Supervised Challenge with a J &F mean score of 76.7%.