Pyramid Dilated Deeper Convlstm For Video Salient Object Detection

Hongmei Song,Wenguan Wang,Sanyuan Zhao,Jianbing Shen,Kin-Man Lam
DOI: https://doi.org/10.1007/978-3-030-01252-6_44
2018-01-01
Abstract:This paper proposes a fast video salient object detection model, based on a novel recurrent network architecture, named Pyramid Dilated Bidirectional ConvLSTM (PDB-ConvLSTM). APyramidDilated Convolution (PDC) module is first designed for simultaneously extracting spatial features at multiple scales. These spatial features are then concatenated and fed into an extended Deeper Bidirectional ConvLSTM (DB-ConvLSTM) to learn spatiotemporal information. Forward and backward ConvLSTMunits are placed in two layers and connected in a cascaded way, encouraging information flow between the bi-directional streams and leading to deeper feature extraction. We further augment DB-ConvLSTM with a PDC-like structure, by adopting several dilated DB-ConvLSTMs to extract multi-scale spatiotemporal information. Extensive experimental results showthat our method outperforms previous video saliencymodels in a large margin, with a real-time speed of 20 fps on a single GPU. With unsupervised video object segmentation as an example application, the proposed model (with a CRF-based post-process) achieves state-of-the-art results on two popular benchmarks, well demonstrating its superior performance and high applicability.
What problem does this paper attempt to address?