Abstract:The Mamba layer offers an efficient selective state space model (SSM) that is highly effective in modeling multiple domains, including NLP, long-range sequence processing, and computer vision. Selective SSMs are viewed as dual models, in which one trains in parallel on the entire sequence via an IO-aware parallel scan, and deploys in an autoregressive manner. We add a third view and show that such models can be viewed as attention-driven models. This new perspective enables us to empirically and theoretically compare the underlying mechanisms to that of the self-attention layers in transformers and allows us to peer inside the inner workings of the Mamba model with explainability methods. Our code is publicly available.

What problem does this paper attempt to address?

The paper attempts to address the following key issues: 1. **Understanding the mechanism of information flow**: Although the Mamba model performs well in various fields (such as natural language processing, image processing, video processing, etc.), the dynamics of its internal information flow and its learning mechanism remain unclear. Specifically, how the Mamba model captures dependencies and their similarities with RNN, CNN, or attention mechanisms have not been fully answered. 2. **Improving interpretability**: Currently, there is a lack of interoperable methods for the Mamba model, which not only makes debugging difficult but also limits its application in socially sensitive areas that require interpretability. Therefore, the paper aims to provide insights into the dynamics of the Mamba model and develop methods to explain these models. 3. **Comparing characteristics of different models**: By introducing hidden attention matrices, the paper hopes to directly compare the characteristics and internal representations of Transformers and the Mamba model. To achieve the above goals, the main contributions of the paper include: - **Revealing the fundamental properties of the Mamba model**: By demonstrating that the Mamba model relies on an implicit attention mechanism implemented by a unique data control linear operator. - **Discovering more attention matrices**: Analysis shows that the number of attention matrices generated by the Mamba model is three orders of magnitude more than that of Transformers. - **Providing interpretability tools**: Based on these hidden attention matrices, interpretability and explanation tools for the Mamba model are provided. - **Theoretical analysis**: A theoretical analysis of the evolution of attention capabilities in state space models and their expressive power is conducted, leading to a deeper understanding of the effectiveness of the Mamba model. In summary, the paper aims to fill the research gap in the dynamics of information flow and interpretability of the Mamba model by introducing new perspectives and methods.

The Hidden Attention of Mamba Models

Demystify Mamba in Vision: A Linear Attention Perspective

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data

An Empirical Study of Mamba-based Language Models

Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation

DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models

Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces

Taipan: Efficient and Expressive State Space Language Models with Selective Attention

Revealing and Mitigating the Local Pattern Shortcuts of Mamba

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

A Survey on Visual Mamba

Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion

VideoMamba: Spatio-Temporal Selective State Space Model

PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

MHS-VM: Multi-Head Scanning in Parallel Subspaces for Vision Mamba

Mamba in Speech: Towards an Alternative to Self-Attention

Decision Mamba: Reinforcement Learning via Sequence Modeling with Selective State Spaces

Mamba State-Space Models Are Lyapunov-Stable Learners