SiamPolar: Realtime Video Object Segmentation with Polar Representation in Traffic Scenes

Yuhui Hong,Yaochen Li,Chao Zhu,Ying Zhang,Yuehu Liu
DOI: https://doi.org/10.1109/ITSC48978.2021.9565074
2021-01-01
Abstract:Video object segmentation (VOS) is an essential part of intelligent transportation systems (ITS). In the real traffic scenes, not only stable accuracy but also real-time speed are important metrics. In this paper, we propose a semi-supervised real-time video object segmentation method based on a Siamese network with a novel polar representation method. This polar representation method could reduce the parameters for encoding masks, so that the inference speed is enhanced significantly. Besides, an asymmetric Siamese network is designed to extract the features from different spatial scales. To reduce the antagonism among the branches of the head part, we propose the idea of peeling convolution. Based on this idea, repeated cross-correlation and semi-FPN are designed as the neck part of the neural network. The experiments on the DAVIS-2016 dataset demonstrate that SiamPolar achieves 71.4% J-mean and 59.2fps. In the real traffic scene dataset, TSD-max dataset, SiamPolar performs the 80.5% J-mean.
What problem does this paper attempt to address?