DCFNet: Dense Complementary Fusion for RGB-Thermal Urban Scene Perception

Yu-Wen Michael Zhang,Gang Zhang,Xiaolin Hu
DOI: https://doi.org/10.1007/978-981-97-4399-5_30
2024-01-01
Abstract:Semantic segmentation is an essential task in computer vision. Conventional methods primarily use clear RGB images from visible-light cameras, thus confining input to a single modality. While effective under adequate lighting, these approaches falter under low-light conditions. To address this issue, some studies have attempted to integrate RGB and thermal images by employing basic concatenation or addition for feature fusion. However, these approaches often lead to inadequate information transfer and overlook the complementary information provided by each modality. This paper introduces DCFNet, a novel semantic segmentation approach. DCFNet combines RGB and thermal images in a complementary manner using the newly introduced Complementary Vote Attention (CVA) module. Our comprehensive experimental analysis reveals that DCFNet surpasses the state-of-the-art (SOTA) models. On the MFNet dataset, our model achieved a mean Intersection over Union of 60.1%, which marks a substantial improvement of 1.2% in all-day scenarios and 2.4% in nighttime scenes over the SOTA.
What problem does this paper attempt to address?