Channel and Spatial Relation-Propagation Network for RGB-Thermal Semantic Segmentation

Zikun Zhou,Shukun Wu,Guoqing Zhu,Hongpeng Wang,Zhenyu He
2023-08-24
Abstract:RGB-Thermal (RGB-T) semantic segmentation has shown great potential in handling low-light conditions where RGB-based segmentation is hindered by poor RGB imaging quality. The key to RGB-T semantic segmentation is to effectively leverage the complementarity nature of RGB and thermal images. Most existing algorithms fuse RGB and thermal information in feature space via concatenation, element-wise summation, or attention operations in either unidirectional enhancement or bidirectional aggregation manners. However, they usually overlook the modality gap between RGB and thermal images during feature fusion, resulting in modality-specific information from one modality contaminating the other. In this paper, we propose a Channel and Spatial Relation-Propagation Network (CSRPNet) for RGB-T semantic segmentation, which propagates only modality-shared information across different modalities and alleviates the modality-specific information contamination issue. Our CSRPNet first performs relation-propagation in channel and spatial dimensions to capture the modality-shared features from the RGB and thermal features. CSRPNet then aggregates the modality-shared features captured from one modality with the input feature from the other modality to enhance the input feature without the contamination issue. While being fused together, the enhanced RGB and thermal features will be also fed into the subsequent RGB or thermal feature extraction layers for interactive feature fusion, respectively. We also introduce a dual-path cascaded feature refinement module that aggregates multi-layer features to produce two refined features for semantic and boundary prediction. Extensive experimental results demonstrate that CSRPNet performs favorably against state-of-the-art algorithms.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily addresses the key challenge in the RGB-Thermal (RGB-T) semantic segmentation task—how to effectively fuse information from RGB images and thermal infrared images to overcome the differences between the two modalities (i.e., the modality gap), and proposes a new solution. ### Problems the Paper Aims to Solve 1. **Modality Gap**: There are inherent differences between RGB images and thermal infrared images, which lead to modality-specific information pollution when directly fusing the two modalities. This means that information unique to one modality can contaminate the information of the other modality. 2. **Limitations of Existing Methods**: Most current RGB-T semantic segmentation algorithms typically use unidirectional enhancement or bidirectional aggregation when fusing RGB and thermal infrared features. These methods often overlook the modality gap between RGB and thermal infrared images, resulting in modality-specific information pollution in the fused features. ### Solution Overview To address the above issues, the paper proposes a new architecture called the "Channel and Spatial Relation-Propagation Network" (CSRPNet). The core idea of CSRPNet is to first extract features containing modality-shared information through a "relation propagation" technique before fusing different modality features. Then, only these shared features are used for interactive multi-modal fusion, thereby avoiding modality-specific information pollution. ### Key Technical Points - **Channel and Spatial Relation-Propagation Module**: This module first calculates the inter-channel and inter-pixel relation matrices between RGB and thermal infrared image features. It then captures modality-shared features through matrix operations and uses these features to enhance the input features, achieving interactive fusion. - **Dual-path Cascaded Feature Refinement Module**: To fully utilize multi-layer fused features, the paper also designs a Dual-path Cascaded Feature Refinement (DCFR) module. This module generates two refined feature maps through two paths, used for boundary prediction and semantic prediction. ### Summary In summary, the paper aims to address the modality gap issue in the RGB-T semantic segmentation task through CSRPNet, ensuring that the characteristics of each modality are preserved during the fusion of RGB and thermal infrared image features, thereby improving segmentation accuracy and robustness.