E3SN: Efficient End-to-End Siamese Network for Video Object Segmentation.

Meng Lan,Yipeng Zhang,Qinning Xu,Lefei Zhang
DOI: https://doi.org/10.24963/ijcai.2020/98
2020-01-01
Abstract:In the semi-supervised video object segmentation (VOS) field, SiamMask has achieved competitive accuracy and the fastest running speed. However, the two-stage training procedure requires additional manual intervention, and using only single-level features does not maximize the rich hierarchical feature information. This paper proposes an efficient end-to-end Siamese network for VOS. In particular, a supervised sampling strategy is designed to optimize the training procedure. Such an optimization facilitates the training of the entire model in an end-to-end manner. Moreover, a multilevel feature aggregation module is developed to enhance feature representability and improve segmentation accuracy. Experimental results on DAVIS2016 and DAVIS2017 datasets show that the proposed approach outperforms the SiamMask in accuracy with similar FPS. Moreover, this approach also achieves good accuracy-speed trade-off compared with that of other state-of-the-art VOS algorithms.
What problem does this paper attempt to address?