Abstract:Deep learning methods, especially Convolutional Neural Networks (CNN) and Vision Transformer (ViT), are frequently employed to perform semantic segmentation of high-resolution remotely sensed images. However, CNNs are constrained by their restricted receptive fields, while ViTs face challenges due to their quadratic complexity. Recently, the Mamba model, featuring linear complexity and a global receptive field, has gained extensive attention for vision tasks. In such tasks, images need to be serialized to form sequences compatible with the Mamba model. Numerous research efforts have explored scanning strategies to serialize images, aiming to enhance the Mamba model's understanding of images. However, the effectiveness of these scanning strategies remains uncertain. In this research, we conduct a comprehensive experimental investigation on the impact of mainstream scanning directions and their combinations on semantic segmentation of remotely sensed images. Through extensive experiments on the LoveDA, ISPRS Potsdam, and ISPRS Vaihingen datasets, we demonstrate that no single scanning strategy outperforms others, regardless of their complexity or the number of scanning directions involved. A simple, single scanning direction is deemed sufficient for semantic segmentation of high-resolution remotely sensed images. Relevant directions for future research are also recommended.

What problem does this paper attempt to address?

This paper attempts to address the issue of whether different scanning strategies significantly impact the performance of the Mamba model in the task of semantic segmentation of high-resolution remote sensing images. Specifically, the paper experimentally studies the effects of mainstream scanning directions and their combinations on semantic segmentation performance and explores the effectiveness of these scanning strategies. ### Background and Problem 1. **Background**: - Deep learning methods, especially Convolutional Neural Networks (CNN) and Vision Transformers (ViT), are commonly used for semantic segmentation of high-resolution remote sensing images. - CNNs, due to their limited receptive field, find it challenging to capture long-range semantic dependencies in high-resolution images. - Although ViTs have a global receptive field, their quadratic complexity poses challenges when processing high-resolution images. - Recently, the Mamba model has gained attention for its linear complexity and global receptive field, being applied in visual tasks. 2. **Problem**: - In the Mamba model, images need to be serialized to form sequences compatible with the model. - Many studies have explored different scanning strategies to serialize images to enhance the Mamba model's understanding of images. - However, the effectiveness of these scanning strategies has not been fully validated. ### Research Objectives - To evaluate the impact of different mainstream scanning directions and their combinations on the semantic segmentation of high-resolution remote sensing images through extensive experiments. - To verify whether a specific scanning strategy can significantly improve the segmentation performance of the Mamba model. ### Experimental Design - Experiments were conducted using three datasets: LoveDA, ISPRS Potsdam, and ISPRS Vaihingen. - 22 scanning strategies were designed, including 12 individual scanning directions and 10 combined scanning directions. - Segmentation performance was evaluated using the mIoU (Mean Intersection over Union) metric. ### Main Findings - Experimental results show that the segmentation performance differences between different scanning strategies are minimal. - A single scanning direction (e.g., D1) is already effective, and complex multi-directional scanning does not bring significant performance improvements. - This indicates that in the task of semantic segmentation of high-resolution remote sensing images, the Mamba model is not sensitive to different scanning strategies. ### Conclusion - For semantic segmentation of high-resolution remote sensing images, using a simple single-direction scanning strategy (e.g., D1) is effective. - Complex multi-directional scanning strategies do not significantly improve segmentation performance, thus reducing computational demands and allowing for deeper networks to be built with limited computational resources. ### Future Work - Explore other methods to enhance the Mamba model's understanding of remote sensing images rather than relying solely on different scanning strategies. - Further investigate the potential applications of the Mamba model in other visual tasks.

Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study

RS3Mamba: Visual State Space Model for Remote Sensing Images Semantic Segmentation

RS3Mamba: Visual State Space Model for Remote Sensing Image Semantic Segmentation

A Novel Mamba Architecture with a Semantic Transformer for Efficient Real-Time Remote Sensing Semantic Segmentation

Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model

Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

LocalMamba: Visual State Space Model with Windowed Selective Scan

QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space Model

UNetMamba: An Efficient UNet-Like Mamba for Semantic Segmentation of High-Resolution Remote Sensing Images

PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation

MHS-VM: Multi-Head Scanning in Parallel Subspaces for Vision Mamba

Advancing high-resolution remote sensing: a compact and powerful approach to semantic segmentation

Seg-LSTM: Performance of xLSTM for Semantic Segmentation of Remotely Sensed Images

PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery

A ViT-Based Multiscale Feature Fusion Approach for Remote Sensing Image Segmentation

Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation

Cross-Scan Mamba with Masked Training for Robust Spectral Imaging

Rethinking the Optimal Strategy of Deep Learning for Sar Image Semantic Segmentation

Link Aggregation for Skip Connection–Mamba: Remote Sensing Image Segmentation Network Based on Link Aggregation Mamba

Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification