Dual-branch deep cross-modal interaction network for semantic segmentation with thermal images

Kang Dai,Suting Chen
DOI: https://doi.org/10.1016/j.engappai.2024.108820
IF: 8
2024-06-20
Engineering Applications of Artificial Intelligence
Abstract:Semantic segmentation using RGB (Red-Green-Blue) images and thermal datas is an indispensable component of autonomous driving. The key to RGB-Thermal (RGB and Thermal) semantic segmentation is achieving the interaction and fusion of features between RGB and thermal images. Therefore, we propose a dual-branch deep cross-modal interaction network (DCIT) based on Encoder–Decoder structure. This framework consists of two parallel networks for feature extraction from RGB and Thermal data. Specifically, in each feature extraction stage of the Encoder, we design a Cross Feature Regulation Modules (CFRM) to align and correct modality specific features by reducing the inter-modality feature differences and eliminating intra-modality noise. Then, the modality features are aggregated through Cross Modal Feature Fusion Module (CMFFM) based on cross linear attention to capture global information from modality features. Finally, Adaptive Multi-Scale Cross-positional Fusion Module (AMCFM) utilizes the fused features to integrate consistent semantic information in the Decoder stage. Our framework can improve the interaction of cross modal features. Extensive experiments on urban scene datasets demonstrate that our proposed framework outperforms other RGB-Thermal semantic segmentation methods in terms of objective metrics and subjective visual assessments.
automation & control systems,computer science, artificial intelligence,engineering, electrical & electronic, multidisciplinary
What problem does this paper attempt to address?