UNetMamba: An Efficient UNet-Like Mamba for Semantic Segmentation of High-Resolution Remote Sensing Images

Enze Zhu,Zhan Chen,Dingkai Wang,Hanru Shi,Xiaoxuan Liu,Lei Wang

2024-10-21

Abstract:Semantic segmentation of high-resolution remote sensing images is vital in downstream applications such as land-cover mapping, urban planning and disaster <a class="link-external link-http" href="http://assessment.Existing" rel="external noopener nofollow">this http URL</a> Transformer-based methods suffer from the constraint between accuracy and efficiency, while the recently proposed Mamba is renowned for being efficient. Therefore, to overcome the dilemma, we propose UNetMamba, a UNet-like semantic segmentation model based on Mamba. It incorporates a mamba segmentation decoder (MSD) that can efficiently decode the complex information within high-resolution images, and a local supervision module (LSM), which is train-only but can significantly enhance the perception of local contents. Extensive experiments demonstrate that UNetMamba outperforms the state-of-the-art methods with mIoU increased by 0.87% on LoveDA and 0.39% on ISPRS Vaihingen, while achieving high efficiency through the lightweight design, less memory footprint and reduced computational cost. The source code is available at <a class="link-external link-https" href="https://github.com/EnzeZhu2001/UNetMamba" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper attempts to address the contradiction between accuracy and efficiency in semantic segmentation of high-resolution remote sensing images. Specifically, existing Transformer-based methods, although significantly improving accuracy, have high computational complexity and a large number of parameters, resulting in low efficiency when processing high-resolution images. On the other hand, the recently proposed Mamba model is efficient but its performance on specific tasks has not been fully validated. To solve this problem, the authors propose a UNet-like model based on Mamba—UNetMamba. This model achieves efficient semantic segmentation through the following three main components: 1. **Encoder**: Uses the ResT backbone network to capture multi-scale feature maps through an Efficient Multi-head Self-Attention mechanism (EMSA). 2. **Mamba Segmentation Decoder (MSD)**: Applies the basic unit of Mamba (VSS block) on the decoding side to efficiently decode complex information with linear complexity. 3. **Local Supervision Module (LSM)**: Enhances the perception of local semantic information through two convolutional branches of different scales and an auxiliary loss function. Experimental results show that UNetMamba not only achieves higher accuracy (mIoU increased by 0.87% and 0.39% respectively) on the LoveDA and ISPRS Vaihingen high-resolution remote sensing image datasets, but also performs excellently in lightweight design, low memory usage, and low computational cost.

UNetMamba: An Efficient UNet-Like Mamba for Semantic Segmentation of High-Resolution Remote Sensing Images

CM-UNet: Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic Segmentation

Link Aggregation for Skip Connection–Mamba: Remote Sensing Image Segmentation Network Based on Link Aggregation Mamba

RS3Mamba: Visual State Space Model for Remote Sensing Image Semantic Segmentation

Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation

RS3Mamba: Visual State Space Model for Remote Sensing Images Semantic Segmentation

Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model

A Novel Mamba Architecture with a Semantic Transformer for Efficient Real-Time Remote Sensing Semantic Segmentation

UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery

UNeXt: An Efficient Network for the Semantic Segmentation of High-Resolution Remote Sensing Images

PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery

Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation

HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image Segmentation

LightM-UNet: Mamba Assists in Lightweight UNet for Medical Image Segmentation

Efficient Multi-scale Network for Semantic Segmentation of fine-Resolution Remotely Sensed Images

MCAT-UNet: Convolutional and Cross-Shaped Window Attention Enhanced UNet for Efficient High-Resolution Remote Sensing Image Segmentation

Multi-Attention-Based Semantic Segmentation Network for Land Cover Remote Sensing Images

VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation