CM-UNet: Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic Segmentation

Mushui Liu,Jun Dan,Ziqian Lu,Yunlong Yu,Yingming Li,Xi Li

2024-05-17

Abstract:Due to the large-scale image size and object variations, current CNN-based and Transformer-based approaches for remote sensing image semantic segmentation are suboptimal for capturing the long-range dependency or limited to the complex computational complexity. In this paper, we propose CM-UNet, comprising a CNN-based encoder for extracting local image features and a Mamba-based decoder for aggregating and integrating global information, facilitating efficient semantic segmentation of remote sensing images. Specifically, a CSMamba block is introduced to build the core segmentation decoder, which employs channel and spatial attention as the gate activation condition of the vanilla Mamba to enhance the feature interaction and global-local information fusion. Moreover, to further refine the output features from the CNN encoder, a Multi-Scale Attention Aggregation (MSAA) module is employed to merge the different scale features. By integrating the CSMamba block and MSAA module, CM-UNet effectively captures the long-range dependencies and multi-scale global contextual information of large-scale remote-sensing images. Experimental results obtained on three benchmarks indicate that the proposed CM-UNet outperforms existing methods in various performance metrics. The codes are available at

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address issues in semantic segmentation of remote sensing images, particularly when dealing with large-scale images and target variations. Existing methods based on Convolutional Neural Networks (CNN) and Transformer-based approaches are either ineffective in capturing long-range dependencies or have high computational complexity. To this end, the paper proposes a new framework called CM-UNet, which combines a CNN encoder to extract local image features and a Mamba-based decoder to aggregate and fuse global information, thereby achieving efficient semantic segmentation of remote sensing images. Specifically, CM-UNet introduces a CSMamba block as the core segmentation decoder, utilizing channel and spatial attention mechanisms to enhance feature interaction and global-local information fusion. Additionally, to further optimize the features output by the CNN encoder, a Multi-Scale Attention Aggregation (MSAA) module is employed to fuse features at different scales. By integrating the CSMamba block and the MSAA module, CM-UNet effectively captures long-range dependencies and multi-scale global contextual information in large-scale remote sensing images. Experimental results demonstrate that CM-UNet outperforms existing methods on multiple benchmark datasets.

CM-UNet: Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic Segmentation

UNetMamba: An Efficient UNet-Like Mamba for Semantic Segmentation of High-Resolution Remote Sensing Images

Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

Link Aggregation for Skip Connection–Mamba: Remote Sensing Image Segmentation Network Based on Link Aggregation Mamba

Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation

MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images

RS3Mamba: Visual State Space Model for Remote Sensing Image Semantic Segmentation

MCAT-UNet: Convolutional and Cross-Shaped Window Attention Enhanced UNet for Efficient High-Resolution Remote Sensing Image Segmentation

RS3Mamba: Visual State Space Model for Remote Sensing Images Semantic Segmentation

Semantic segmentation of remote sensing images combined with attention mechanism and feature enhancement U-Net

Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model

Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

CM-Unet: A Novel Remote Sensing Image Segmentation Method Based on Improved U-Net

HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image Segmentation

MSVM-UNet: Multi-Scale Vision Mamba UNet for Medical Image Segmentation

VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation

UAM-Net: an Attention-Based Multi-level Feature Fusion UNet for Remote Sensing Image Segmentation.

LKM-UNet: Large Kernel Vision Mamba UNet for Medical Image Segmentation

Multi-Attention-Based Semantic Segmentation Network for Land Cover Remote Sensing Images

A Multi-Attention UNet for Semantic Segmentation in Remote Sensing Images