Enhanced Memory Network for Video Segmentation

Zhishan Zhou,Lejian Ren,Pengfei Xiong,Yifei Ji,Peisen Wang,Haoqiang Fan,Si Liu
DOI: https://doi.org/10.1109/iccvw.2019.00083
2019-01-01
Abstract:This paper proposes an Enhanced Memory Network (EMN) for semi-supervised video object segmentation. Space-Time Memory Networks has proven the effectiveness of the abundant use of guidance information. To further improve the accuracy of unknown and small targets, we propose to perform fined-grained segmentation based on the correlation attention map. We introduce a siamese network to obtain the semantic similarity and relevance between the tracking objects and the whole image. The feature map extracted from the siamese network on the cropped image is multiplied onto the whole feature map as the attention of proposal objects. Also, an ASPP module is employed to increase the semantic receptive filed to further improve the segmentation accuracy on different scale. Based on the multi-object combination and multi-scale ensemble, the proposed algorithm achieves the first place on the YouTube-VOS 2019 Semi-supervised Video Object Segmentation Challenge with a J&F mean score of 81.8%.
What problem does this paper attempt to address?