Abstract:Real-time remote sensing segmentation technology is crucial for unmanned aerial vehicles (UAVs) in battlefield surveillance, land characterization observation, earthquake disaster assessment, etc., and can significantly enhance the application value of UAVs in military and civilian fields. To realize this potential, it is essential to develop real-time semantic segmentation methods that can be applied to resource-limited platforms, such as edge devices. The majority of mainstream real-time semantic segmentation methods rely on convolutional neural networks (CNNs) and transformers. However, CNNs cannot effectively capture long-range dependencies, while transformers have high computational complexity. This paper proposes a novel remote sensing Mamba architecture for real-time segmentation tasks in remote sensing, named RTMamba. Specifically, the backbone utilizes a Visual State-Space (VSS) block to extract deep features and maintains linear computational complexity, thereby capturing long-range contextual information. Additionally, a novel Inverted Triangle Pyramid Pooling (ITP) module is incorporated into the decoder. The ITP module can effectively filter redundant feature information and enhance the perception of objects and their boundaries in remote sensing images. Extensive experiments were conducted on three challenging aerial remote sensing segmentation benchmarks, including Vaihingen, Potsdam, and LoveDA. The results show that RTMamba achieves competitive performance advantages in terms of segmentation accuracy and inference speed compared to state-of-the-art CNN and transformer methods. To further validate the deployment potential of the model on embedded devices with limited resources, such as UAVs, we conducted tests on the Jetson AGX Orin edge device. The experimental results demonstrate that RTMamba achieves impressive real-time segmentation performance.

ReMamber: Referring Image Segmentation with Mamba Twister

MambaReID: Exploiting Vision Mamba for Multi-Modal Object Re-Identification

Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion

QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space Model

A Hybrid Framework for Referring Image Segmentation: Dual-Decoder Model with SAM Complementation

ReMamba: Equip Mamba with Effective Long-Sequence Modeling

FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba

2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification

SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation

V2M: Visual 2-Dimensional Mamba for Image Representation Learning

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

Mamba-R: Vision Mamba ALSO Needs Registers

VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation

Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

A Novel Mamba Architecture with a Semantic Transformer for Efficient Real-Time Remote Sensing Semantic Segmentation

SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution