Abstract:Combining color (RGB) images with thermal images can facilitate semantic segmentation of poorly lit urban scenes. However, for RGB-thermal (RGB-T) semantic segmentation, most existing models address cross-modal feature fusion by focusing only on exploring the samples while neglecting the connections between different samples. Additionally, although the importance of boundary, binary, and semantic information is considered in the decoding process, the differences and complementarities between different morphological features are usually neglected. In this paper, we propose a novel RGB-T semantic segmentation network, called MMSMCNet, based on modal memory fusion and morphological multiscale assistance to address the aforementioned problems. For this network, in the encoding part, we used SegFormer for feature extraction of bimodal inputs. Next, our modal memory sharing module implements staged learning and memory sharing of sample information across modal multiscales. Furthermore, we constructed a decoding union unit comprising three decoding units in a layer-by-layer progression that can extract two different morphological features according to the information category and realize the complementary utilization of multiscale cross-modal fusion information. Each unit contains a contour positioning module based on detail information, a skeleton positioning module with deep features as the primary input, and a morphological complementary module for mutual reinforcement of the first two types of information and construction of semantic information. Based on this, we constructed a new supervision strategy, that is, a multi-unit-based complementary supervision strategy. Extensive experiments using two standard datasets showed that MMSMCNet outperformed related state-of-the-art methods. The code is available at: https://github.com/2021nihao/MMSMCNet.

Dual-branch deep cross-modal interaction network for semantic segmentation with thermal images

NLFNet: Non-Local Fusion Towards Generalized Multimodal Semantic Segmentation Across RGB-Depth, Polarization, and Thermal Images

Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation

ACNET: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation.

A Feature Divide-and-Conquer Network for RGB-T Semantic Segmentation

Residual Spatial Fusion Network for RGB-Thermal Semantic Segmentation

Cross-Modality Double Bidirectional Interaction and Fusion Network for RGB-T Salient Object Detection

Channel and Spatial Relation-Propagation Network for RGB-Thermal Semantic Segmentation

DCFNet: Dense Complementary Fusion for RGB-Thermal Urban Scene Perception

An RGB-D Fusion Based Semantic Segmentation Algorithm Based on Neighborhood Metric Relations

Interactive Efficient Multi-Task Network for RGB-D Semantic Segmentation

Dual-Space Graph-Based Interaction Network for RGB-Thermal Semantic Segmentation in Electric Power Scene

Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation

Multi-branch Differential Bidirectional Fusion Network for RGB-T Semantic Segmentation

MMSMCNet: Modal Memory Sharing and Morphological Complementary Networks for RGB-T Urban Scene Semantic Segmentation

Region-adaptive and context-complementary cross modulation for RGB-T semantic segmentation

Rgb-t semantic segmentation based on cross-operational fusion attention in autonomous driving scenario

DBCNet: Dynamic Bilateral Cross-Fusion Network for RGB-T Urban Scene Understanding in Intelligent Vehicles

CDMANet: central difference mutual attention network for RGB-D semantic segmentation

Deep Feature Selection-And-Fusion for RGB-D Semantic Segmentation

Multi-type and Multi-level Feature Fusion Network for RGBD Indoor Semantic Segmentation