RS3Mamba: Visual State Space Model for Remote Sensing Images Semantic Segmentation

Xianping Ma,Xiaokang Zhang,Man-On Pun

2024-04-03

Abstract:Semantic segmentation of remote sensing images is a fundamental task in geoscience research. However, there are some significant shortcomings for the widely used convolutional neural networks (CNNs) and Transformers. The former is limited by its insufficient long-range modeling capabilities, while the latter is hampered by its computational complexity. Recently, a novel visual state space (VSS) model represented by Mamba has emerged, capable of modeling long-range relationships with linear computability. In this work, we propose a novel dual-branch network named remote sensing images semantic segmentation Mamba (RS3Mamba) to incorporate this innovative technology into remote sensing tasks. Specifically, RS3Mamba utilizes VSS blocks to construct an auxiliary branch, providing additional global information to convolution-based main branch. Moreover, considering the distinct characteristics of the two branches, we introduce a collaborative completion module (CCM) to enhance and fuse features from the dual-encoder. Experimental results on two widely used datasets, ISPRS Vaihingen and LoveDA Urban, demonstrate the effectiveness and potential of the proposed RS3Mamba. To the best of our knowledge, this is the first vision Mamba specifically designed for remote sensing images semantic segmentation. The source code will be made available at

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address issues in the task of semantic segmentation of remote sensing images. Currently, widely used Convolutional Neural Networks (CNNs) and Transformer models have some significant drawbacks when processing remote sensing images: 1. **Limitations of CNNs**: CNNs are limited by their local receptive fields, making it difficult to capture complex global information, which is a challenge for remote sensing images with complex scenes and large variations in object scales. 2. **Computational Complexity of Transformers**: Although Transformers can model long-range dependencies, their high computational complexity leads to issues in model efficiency and memory consumption. To address the above issues, the authors propose a new dual-branch network architecture named RS3Mamba, which leverages the Visual State Space (VSS) model to enhance feature extraction capabilities. Specifically, RS3Mamba includes a convolution-based main branch and an auxiliary branch that provides additional global information through VSS blocks. Furthermore, to fuse the feature differences between the two branches, a Collaborative Completion Module (CCM) is introduced to enhance and integrate features from the dual encoders. Experimental results show that RS3Mamba outperforms existing CNN and Transformer-based methods on the ISPRS Vaihingen and LoveDA Urban datasets.

RS3Mamba: Visual State Space Model for Remote Sensing Images Semantic Segmentation

RS3Mamba: Visual State Space Model for Remote Sensing Image Semantic Segmentation

PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation

Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model

MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images

Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

RSMamba: Remote Sensing Image Classification With State Space Model

UNetMamba: An Efficient UNet-Like Mamba for Semantic Segmentation of High-Resolution Remote Sensing Images

A Novel Mamba Architecture with a Semantic Transformer for Efficient Real-Time Remote Sensing Semantic Segmentation

CM-UNet: Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic Segmentation

VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation

Link Aggregation for Skip Connection–Mamba: Remote Sensing Image Segmentation Network Based on Link Aggregation Mamba

Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study

MSFMamba: Multi-Scale Feature Fusion State Space Model for Multi-Source Remote Sensing Image Classification

Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation

ConvMambaSR: Leveraging State-Space Models and CNNs in a Dual-Branch Architecture for Remote Sensing Imagery Super-Resolution

PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery

Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation

RSDehamba: Lightweight Vision Mamba for Remote Sensing Satellite Image Dehazing