Region-adaptive and context-complementary cross modulation for RGB-T semantic segmentation

Fengguang Peng,Zihan Ding,Ziming Chen,Gang Wang,Tianrui Hui,Si Liu,Hang Shi
DOI: https://doi.org/10.1016/j.patcog.2023.110092
IF: 8
2023-11-02
Pattern Recognition
Abstract:RGB-Thermal (RGB-T) semantic segmentation is an emerging task aiming to improve the robustness of segmentation methods under extreme imaging conditions with the aid of thermal infrared modality. Foreground-background distinguishment and complementary information mining are two key challenges of this task. Recent methods use naive channel attention and cross-attention to tackle these challenges, but they still struggle with a sub-optimal solution where salient foreground features and noisy background ones might be equally modulated without distinction. The quadratic computational overhead of cross-attention also blocks its application on high-resolution features. Moreover, lacking complementary information mining in the encoding phase hinders the comprehensive scene encoding as well. To alleviate these limitations, we propose a cross modulation process with two collaborative components. The first Region-Adaptive Channel Modulation (RACM) module conducts channel attention at a fine-grained region level where foreground and background regions can be modulated differently in each channel. The second Context-Complementary Spatial Modulation (CCSM) module mines and transfers complementary information between the two modalities early in the encoding phase. Experiments show that our method achieves state-of-the-art performances on current RGB-T segmentation benchmarks.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?