Abstract:Combining color (RGB) images with thermal images can facilitate semantic segmentation of poorly lit urban scenes. However, for RGB-thermal (RGB-T) semantic segmentation, most existing models address cross-modal feature fusion by focusing only on exploring the samples while neglecting the connections between different samples. Additionally, although the importance of boundary, binary, and semantic information is considered in the decoding process, the differences and complementarities between different morphological features are usually neglected. In this paper, we propose a novel RGB-T semantic segmentation network, called MMSMCNet, based on modal memory fusion and morphological multiscale assistance to address the aforementioned problems. For this network, in the encoding part, we used SegFormer for feature extraction of bimodal inputs. Next, our modal memory sharing module implements staged learning and memory sharing of sample information across modal multiscales. Furthermore, we constructed a decoding union unit comprising three decoding units in a layer-by-layer progression that can extract two different morphological features according to the information category and realize the complementary utilization of multiscale cross-modal fusion information. Each unit contains a contour positioning module based on detail information, a skeleton positioning module with deep features as the primary input, and a morphological complementary module for mutual reinforcement of the first two types of information and construction of semantic information. Based on this, we constructed a new supervision strategy, that is, a multi-unit-based complementary supervision strategy. Extensive experiments using two standard datasets showed that MMSMCNet outperformed related state-of-the-art methods. The code is available at: https://github.com/2021nihao/MMSMCNet.

BMDENet: Bi-Directional Modality Difference Elimination Network for Few-Shot RGB-T Semantic Segmentation

NLFNet: Non-Local Fusion Towards Generalized Multimodal Semantic Segmentation Across RGB-Depth, Polarization, and Thermal Images

Multi-branch Differential Bidirectional Fusion Network for RGB-T Semantic Segmentation

Mitigating Modality Discrepancies for RGB-T Semantic Segmentation

Mask-guided Modality Difference Reduction Network for RGB-T Semantic Segmentation

ABMDRNet: Adaptive-weighted Bi-directional Modality Difference Reduction Network for RGB-T Semantic Segmentation

Residual Spatial Fusion Network for RGB-Thermal Semantic Segmentation

A Feature Divide-and-Conquer Network for RGB-T Semantic Segmentation

Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation

SFAF-MA: Spatial Feature Aggregation and Fusion With Modality Adaptation for RGB-Thermal Semantic Segmentation

Visible and Thermal Images Fusion Architecture for Few-Shot Semantic Segmentation

MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation

MMSMCNet: Modal Memory Sharing and Morphological Complementary Networks for RGB-T Urban Scene Semantic Segmentation

Multispectral Fusion Transformer Network for RGB-Thermal Urban Scene Semantic Segmentation

The Network of Attention-Aware Multimodal Fusion for RGB-D Indoor Semantic Segmentation Method

Bimodal Feature Propagation and Fusion for Real-time Semantic Segmentation on RGB-D Images

DCFNet: Dense Complementary Fusion for RGB-Thermal Urban Scene Perception

FEANet: Feature-Enhanced Attention Network for RGB-Thermal Real-time Semantic Segmentation

Cross-modal Attention Fusion Network for RGB-D Semantic Segmentation

Dual-branch deep cross-modal interaction network for semantic segmentation with thermal images

Interactive Efficient Multi-Task Network for RGB-D Semantic Segmentation