Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data

Shufan Li,Harkanwar Singh,Aditya Grover

2024-07-14

Abstract:In recent years, Transformers have become the de-facto architecture for sequence modeling on text and a variety of multi-dimensional data, such as images and video. However, the use of self-attention layers in a Transformer incurs prohibitive compute and memory complexity that scales quadratically w.r.t. the sequence length. A recent architecture, Mamba, based on state space models has been shown to achieve comparable performance for modeling text sequences, while scaling linearly with the sequence length. In this work, we present Mamba-ND, a generalized design extending the Mamba architecture to arbitrary multi-dimensional data. Our design alternatively unravels the input data across different dimensions following row-major orderings. We provide a systematic comparison of Mamba-ND with several other alternatives, based on prior multi-dimensional extensions such as Bi-directional LSTMs and S4ND. Empirically, we show that Mamba-ND demonstrates performance competitive with the state-of-the-art on a variety of multi-dimensional benchmarks, including ImageNet-1K classification, HMDB-51 action recognition, and ERA5 weather forecasting.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper primarily attempts to solve the following problems: 1. **Computational complexity in multidimensional data modeling**: - The current Transformer architecture, when handling multidimensional data (such as images, videos, etc.), faces computational complexity that grows quadratically with the sequence length due to the self-attention mechanism. This makes it challenging for the model to scale to longer sequences. - The previous Mamba architecture achieved linear complexity through State Space Models (SSM) but performed well mainly on 1D text sequences. How to extend this to multidimensional data remains an unresolved issue. 2. **Effective methods for processing multidimensional data**: - A new design, Mamba-ND, is proposed, which processes multidimensional data by alternately unfolding different dimensions of the input data. This achieves performance comparable to or better than existing Transformer models while maintaining a lower parameter count and linear complexity. 3. **Comparative study of different design choices**: - Extensive ablation experiments were conducted on various possible designs, including bidirectional design (Bi-SSM), multidirectional design (ND-SSM), and multi-head design (Multi-Head-SSM). It was ultimately found that the alternating direction design is the simplest and most effective solution. Through these studies, the authors aim to provide a general and efficient framework for handling various multidimensional data tasks, including image classification, action recognition, weather forecasting, and 3D segmentation.

Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

MatMamba: A Matryoshka State Space Model

An Empirical Study of Mamba-based Language Models

Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges

Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models

SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series

The Hidden Attention of Mamba Models

A Survey of Mamba

Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era of Transformers

Decision Mamba Architectures

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation

NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs