Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model

Qinfeng Zhu,Yuanzhi Cai,Yuan Fang,Yihan Yang,Cheng Chen,Lei Fan,Anh Nguyen

2024-04-11

Abstract:High-resolution remotely sensed images pose a challenge for commonly used semantic segmentation methods such as Convolutional Neural Network (CNN) and Vision Transformer (ViT). CNN-based methods struggle with handling such high-resolution images due to their limited receptive field, while ViT faces challenges in handling long sequences. Inspired by Mamba, which adopts a State Space Model (SSM) to efficiently capture global semantic information, we propose a semantic segmentation framework for high-resolution remotely sensed images, named Samba. Samba utilizes an encoder-decoder architecture, with Samba blocks serving as the encoder for efficient multi-level semantic information extraction, and UperNet functioning as the decoder. We evaluate Samba on the LoveDA, ISPRS Vaihingen, and ISPRS Potsdam datasets, comparing its performance against top-performing CNN and ViT methods. The results reveal that Samba achieved unparalleled performance on commonly used remote sensing datasets for semantic segmentation. Our proposed Samba demonstrates for the first time the effectiveness of SSM in semantic segmentation of remotely sensed images, setting a new benchmark in performance for Mamba-based techniques in this specific application. The source code and baseline implementations are available at

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper attempts to address the problem of semantic segmentation of high-resolution remote sensing images. Specifically: 1. **Limitations of existing methods**: - Methods based on Convolutional Neural Networks (CNN) are limited by their finite receptive field when processing high-resolution images. - Methods based on Vision Transformers (ViT) have a global attention mechanism but face exponentially increasing computational complexity when handling long sequences and require a large amount of training data. 2. **Proposed new method**: - The paper proposes a semantic segmentation framework named Samba, which utilizes the State Space Model (SSM) to efficiently capture global semantic information. - Samba adopts an encoder-decoder architecture, where the encoder part consists of Samba blocks, and the decoder uses UperNet. 3. **Experimental results**: - On several commonly used remote sensing image datasets (such as LoveDA, ISPRS Vaihingen, and ISPRS Potsdam), Samba significantly outperforms existing CNN and ViT methods, achieving the best performance in terms of mIoU metric. - Especially on the LoveDA dataset, Samba outperforms the best ViT method (Segformer) by 3.95% and the best CNN method (ConvNeXt) by 10.3%. 4. **Main contributions**: - For the first time, the Mamba architecture is applied to the task of semantic segmentation of remote sensing images. - Demonstrates the great potential of the Mamba architecture as a backbone network for remote sensing image semantic segmentation. - Establishes new performance benchmarks and provides directions for future research.

Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model

RS3Mamba: Visual State Space Model for Remote Sensing Image Semantic Segmentation

RS3Mamba: Visual State Space Model for Remote Sensing Images Semantic Segmentation

PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation

UNetMamba: An Efficient UNet-Like Mamba for Semantic Segmentation of High-Resolution Remote Sensing Images

CM-UNet: Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic Segmentation

RSMamba: Remote Sensing Image Classification With State Space Model

Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation

A Novel Mamba Architecture with a Semantic Transformer for Efficient Real-Time Remote Sensing Semantic Segmentation

Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation

Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study

2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification

ConvMambaSR: Leveraging State-Space Models and CNNs in a Dual-Branch Architecture for Remote Sensing Imagery Super-Resolution

FusionMamba: Efficient Remote Sensing Image Fusion with State Space Model

Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion

Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation