Abstract:Sequence modeling is a crucial area across various domains, including Natural Language Processing (NLP), speech recognition, time series forecasting, music generation, and bioinformatics. Recurrent Neural Networks (RNNs) and Long Short Term Memory Networks (LSTMs) have historically dominated sequence modeling tasks like Machine Translation, Named Entity Recognition (NER), etc. However, the advancement of transformers has led to a shift in this paradigm, given their superior performance. Yet, transformers suffer from $O(N^2)$ attention complexity and challenges in handling inductive bias. Several variations have been proposed to address these issues which use spectral networks or convolutions and have performed well on a range of tasks. However, they still have difficulty in dealing with long sequences. State Space Models(SSMs) have emerged as promising alternatives for sequence modeling paradigms in this context, especially with the advent of S4 and its variants, such as S4nd, Hippo, Hyena, Diagnol State Spaces (DSS), Gated State Spaces (GSS), Linear Recurrent Unit (LRU), Liquid-S4, Mamba, etc. In this survey, we categorize the foundational SSMs based on three paradigms namely, Gating architectures, Structural architectures, and Recurrent architectures. This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data. Moreover, we consolidate the performance of SSMs on benchmark datasets like Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2, as well as video datasets such as Breakfast, COIN, LVU, and various time series datasets. The project page for Mamba-360 work is available on this webpage.\url{

Block-State Transformer

Block-State Transformers

Blockwise Parallel Transformer for Large Context Models

Efficient Long Sequence Modeling Via State Space Augmented Transformer

Multi-Head State Space Model for Speech Recognition

Longhorn: State Space Models are Amortized Online Learners

Convolutional State Space Models for Long-Range Spatiotemporal Modeling

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges

Block-Recurrent Transformers

Enhanced Structured State Space Models via Grouped FIR Filtering and Attention Sink Mechanisms

Efficiently Modeling Long Sequences with Structured State Spaces

Block Transformer: Global-to-Local Language Modeling for Fast Inference

State Space Model for New-Generation Network Alternative to Transformers: A Survey

Repeat After Me: Transformers are Better than State Space Models at Copying

The Illusion of State in State-Space Models

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding

Sentence-State LSTMs For Sequence-to-Sequence Learning.