Bimodal Feature Propagation and Fusion for Real-time Semantic Segmentation on RGB-D Images

Ying He,Li Xiao,Zhigang Sun,Zhuo Wang
DOI: https://doi.org/10.1109/icsp54964.2022.9778300
2022-01-01
Abstract:Semantic Segmentation is the foundation of scene understanding and automatic driving tasks. One of the challenges of semantic segmentation is the reduction of feature resolution as the network goes deep. In this paper, the low-resolution features are integrated into high-resolution features progressively, enabling high-resolution prediction. Currently, incorporating the auxiliary depth information into a semantic segmentation framework has proven to be helpful to improve accuracy. However, additional process of depth data immensely increases computational complexity, resulting in limitation for practical application. Here, a real-time bimodal semantic segmentation network that effectively extracts and fuses complementary information is investigated. Our network extends a light-weight architecture which can seamlessly take advantage of various prevalent backbone networks. Our model extracts RGB and depth features from two branches in parallel and integrates multi-level features of two modalities through bimodal feature fusion blocks. To reduce computation complexity, we employ depth-wise separable convolution while keep performance intact. Meanwhile, an attention mechanism is exploited to recalibrate the fusion modality features adaptively. Refinement components are retained to refine multi-resolution feature maps. We conduct ablative study to demonstrate the availability of our bimodal fusion blocks. Comprehensive experiments validate that the proposed network outperforms the state-of-the-arts and achieves 18 FPS on the challenging RGB-D dataset, NYUDv2.
What problem does this paper attempt to address?