Video Semantic Segmentation With Distortion-Aware Feature Correction
Jiafan Zhuang,Zilei Wang,Bingke Wang
DOI: https://doi.org/10.1109/tcsvt.2020.3037234
IF: 5.859
2021-08-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Video semantic segmentation aims to generate an accurate semantic map for each frame in a video. For such a task, conducting per-frame image segmentation is generally unacceptable in practice due to high computation cost. To address this issue, many works perform the flow-based feature propagation to reuse the features of previous frames, which essentially exploits the content continuity of consecutive frames. However, the estimated optical flow would inevitably suffer inaccuracy and then make the propagated features distorted. In this article, we propose a distortion-aware feature correction method with the goal of improving video segmentation performance at a low price. Our core idea is to correct the features on distorted regions using the current frame while reserving the propagated features for other regions. In this way, a lightweight network is enough for achieving promising segmentation results. In particular, we propose to predict the distorted regions by utilizing the consistency of distortion patterns in images and features, such that the high-cost feature extraction from current frames can be avoided. We conduct extensive experiments on Cityscapes, CamVid, and UAVid, and the results show that our proposed method significantly outperforms previous methods and achieves the state-of-the-art performance on both segmentation accuracy and speed. Code and pretrained models are available at https://github.com/jfzhuang/DAVSS.
engineering, electrical & electronic