Multimodal Frequeny Spectrum Fusion Schema for RGB-T Image Semantic Segmentation

Wenzhang Zhang,Tianhong Dai,Hengyan Liu,Guangyu Ren,Longfei Yin
DOI: https://doi.org/10.1109/ICCCN61486.2024.10637614
2024-07-29
Abstract:Semantic segmentation confronts challenges with traditional networks tailored exclusively for RGB inputs, which may suffer from quality degradation under adverse conditions like low-level illumination or inclement weather. Recent advancements have shown promising outcomes by integrating RGB images with corresponding thermal infrared (TIR) images. However, effectively fusing features from both modalities remains a significant challenge. In this paper, we introduce a novel approach termed Multimodal Frequency Spectrum Fusion Schema (MFSFS) for semantic segmentation of RGB-T images. MFSFS leverages the advantages of the frequency spectrum to effectively extract and utilize multimodal feature information. To mitigate redundant information’s adverse effects during multimodal fusion in the frequency domain, we propose a diversity-oriented contrastive learning approach. Simulation results demonstrate that MFSFS achieves competitive performance while maintaining a relatively smaller model size.
Environmental Science,Engineering,Computer Science
What problem does this paper attempt to address?