Efficient Semantic Segmentation for Compressed Video

Cai Jiaxin,Li Qi,Shen Yulin,Pan Jia,Liu Wenxi
DOI: https://doi.org/10.1109/icra57147.2024.10610435
2024-01-01
Abstract:Robots, constrained by limited onboard computing resources, often encounter situations wherein high-resolution and high-bit-rate videos captured by their cameras necessitate compression before further analysis. In this paper, we propose a novel video semantic segmentation paradigm for compressed video. Specifically, our framework draws the inspiration from the principle of Wavelet Transform, and thus we design the network structure, WTDecomNet, approximating the decomposition of high-resolution image into its low-resolution counterpart and axial details. The aim is to well preserve the image content through decomposition and maintain model efficiency by obtaining semantics from low-resolution image. To facilitate this purpose, we propose an efficient axial subband approximation module for extracting axial details and a lightweight temporal alignment module for associating keyframes and non-keyframes of compressed video. Through comprehensive experiments, we show that our model can achieve the state-of-the-art performance on public benchmarks. Especially on CamVid, comparing to baseline, our proposed model reduces the computational overhead by 70% while improving mIoU by 4%.
What problem does this paper attempt to address?