Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification

Weilian Zhou,Sei-Ichiro Kamata,Haipeng Wang,Man-Sing Wong,Huiying
2024-07-13
Abstract:Hyperspectral image (HSI) classification is pivotal in the remote sensing (RS) field, particularly with the advancement of deep learning techniques. Sequential models, adapted from the natural language processing (NLP) field such as Recurrent Neural Networks (RNNs) and Transformers, have been tailored to this task, offering a unique viewpoint. However, several challenges persist 1) RNNs struggle with centric feature aggregation and are sensitive to interfering pixels, 2) Transformers require significant computational resources and often underperform with limited HSI training samples, and 3) Current scanning methods for converting images into sequence-data are simplistic and inefficient. In response, this study introduces the innovative Mamba-in-Mamba (MiM) architecture for HSI classification, the first attempt of deploying State Space Model (SSM) in this task. The MiM model includes 1) A novel centralized Mamba-Cross-Scan (MCS) mechanism for transforming images into sequence-data, 2) A Tokenized Mamba (T-Mamba) encoder that incorporates a Gaussian Decay Mask (GDM), a Semantic Token Learner (STL), and a Semantic Token Fuser (STF) for enhanced feature generation and concentration, and 3) A Weighted MCS Fusion (WMF) module coupled with a Multi-Scale Loss Design to improve decoding efficiency. Experimental results from three public HSI datasets with fixed and disjoint training-testing samples demonstrate that our method outperforms existing baselines and state-of-the-art approaches, highlighting its efficacy and potential in HSI applications.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper primarily addresses the challenges in the task of hyperspectral image (HSI) classification by proposing a new architecture—Mamba-in-Mamba (MiM)—to improve the performance and efficiency of existing methods. #### Main Issues: 1. **Limitations of RNN**: - RNNs are susceptible to the influence of noisy pixels when processing hyperspectral images and are computationally inefficient when handling larger image patches. 2. **Limitations of Transformer**: - Transformers require substantial computational resources and perform poorly when training samples are limited. - Transformers lack the ability to effectively capture local spatial features. #### Solutions: 1. **Innovative Scanning Mechanism (Centralized Mamba-Cross-Scan, MCS)**: - A new scanning method is proposed that can convert image patches into sequences in multiple directions, thereby better capturing the features of the central pixel. 2. **Tokenized Mamba Encoder (T-Mamba Encoder)**: - Combines Gaussian Decay Mask (GDM), Semantic Token Learner (STL), and Semantic Token Fuser (STF) to enhance feature generation and concentration. 3. **Weighted MCS Fusion Module (WMF)**: - Combined with a multi-scale loss design to improve model training efficiency. Through these methods, the paper demonstrates that this approach achieves highly competitive and even state-of-the-art performance on 4 public hyperspectral image datasets, proving its effectiveness and potential in hyperspectral image classification.