Dual Correlation Network for Efficient Video Semantic Segmentation

Shumin An,Qingmin Liao,Zongqing Lu,Jing-Hao Xue
DOI: https://doi.org/10.1109/tcsvt.2023.3298644
IF: 5.859
2023-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Video data bring a big challenge to semantic segmentation due to the large volume of data and strong inter-frame redundancy. In this paper, we propose a dual local and global correlation network tailored for efficient video semantic segmentation. It consists of three modules: 1) a local attention based module, which measures correlation and achieves feature aggregation in a local region between key frame and non-key frame; 2) a consistent constraint module, which considers long-range correlation among pixels from a global view for promoting intra-frame semantic consistency of non-key frame; and 3) a key frame decision module, which selects key frames adaptively based on the ability of feature transferring. Extensive experiments on the Cityscapes and Camvid video datasets demonstrate that our proposed method could reduce inference time significantly while maintaining high accuracy. The implementation is available at https://github.com/An01168/DCNVSS.
engineering, electrical & electronic
What problem does this paper attempt to address?