Integrating Long-Short Term Network for Efficient Video Object Segmentation.

Jingjing Wang,Zhu Teng,Baopeng Zhang,Jianping Fan
2020-01-01
Abstract:Real-world application of video object segmentation (VOS) is a very challenging problem, especially for multiple video object segmentation. The deep-learning-based approaches have recently dominated VOS by fine-tuning the networks at the first frame to seize the object dynamics, but they may result in impractical frame-rates and risk of over-fitting. To overcome this limitation, we develop an efficient and fully end-to-end model to achieve fast and accurate VOS, named Long-Short Term Network (LSTNet). It contains a long term network to encode absolute object variations and a short term network to capture relative object dynamics. The segmentation results of video objects can be directly acquired by an attentional gate operation based on these two networks. Our proposed model runs at a very high speed and can conveniently tackle multi-object segmentation without post-processing. Extensive experiments on widely used benchmarks including YouTube-VOS and DAVIS 2017 have demonstrated that our proposed model can achieve a competitive accuracy and speed in comparison to a number of state-of-theart methods.
What problem does this paper attempt to address?