Sparse Mamba: Introducing Controllability, Observability, And Stability To Structural State Space Models

Emadeldeen Hamdan,Hongyi Pan,Ahmet Enis Cetin
2024-11-09
Abstract:Structured state space models' (SSMs) development in recent studies, such as Mamba and Mamba2, outperformed and solved the computational inefficiency of transformers and large language models at small to medium scale. In this work, we introduce the concept of controllability and observability to the original Mamba SSM's architecture in our Sparse-Mamba (S-Mamba) for natural language processing (NLP) applications. Moreover, we reinforce stability on the $nxn$ $A$ matrix on Mmaba2. The Mamba SSMs architecture drops the need for attention layers or multilayer perception blocks in transformers. However, current Mamba models lack reinforcement of controllability in state-space equations for computing the $A$, $B$, $C$, and $D$ matrices at each time step, leading to increased complexity and computational costs. Furthermore, the $A$ matrix in Mamba2 is not always stable. We demonstrate a reduction of parameters compared to the first published Mamba and Mamba2. We showcase an improvement in perplexity by 5\% and a decrease in training time by 3\% after reinforcing controllability and observability on the original Mamba architecture in our proposed S-Mamba. We further enforce stability on the $A$ matrix in Mamba2 to improve the loss and perplexity of the model. The controllable and stable $n \times n$ state matrix $A$ is sparse, and it has only $n$ free parameters. Our novel approach will ensure controllable/observable and stable SSMs, which will be the gate key for Mamba3.
Machine Learning,Systems and Control
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key problems encountered by existing Structured State Space Models (SSMs) in Natural Language Processing (NLP) applications, specifically including: 1. **Lack of Controllability and Observability**: - The current Mamba model lacks the reinforcement of controllability and observability in the state - space equations. This leads to an increase in the complexity when calculating the \(A\), \(B\), \(C\) and \(D\) matrices, thereby increasing the computational cost. - The paper introduces Sparse Mamba (S - Mamba), which optimizes the calculation of these matrices by adding controllability and observability to the original Mamba architecture. 2. **Stability Issues**: - In Mamba2, the \(A\) matrix is not always stable. An unstable state matrix may lead to problems such as divergence during the model training process. - The paper ensures the stable operation of the system by forcing the stability of the \(A\) matrix, thus improving the performance of the model. 3. **Parameter Redundancy and Computational Efficiency**: - The existing Mamba model has the problem of parameter redundancy, resulting in low computational efficiency. - S - Mamba reduces the number of parameters and significantly reduces the training time by introducing sparsity. 4. **Model Performance Improvement**: - The paper shows that by introducing controllability, observability and stability, S - Mamba improves the perplexity by 5% and reduces the training time by 3%. ### Specific Improvement Measures - **Controllability**: By transforming the system into a controllable canonical form, the input can more effectively influence the state change of the system. - **Observability**: By transforming the system into an observable canonical form, the internal state of the system can be more accurately inferred from the output. - **Stability**: By adjusting the \(A\) matrix to ensure that all of its eigenvalues are negative real numbers or complex numbers with negative real parts, the stability of the system is guaranteed. ### Experimental Results The experimental results show that S - Mamba performs better than the original Mamba model on multiple datasets, especially with significant improvements in perplexity and training time. In addition, the number of parameters is also greatly reduced, which proves the effectiveness of sparsification. ### Conclusion By introducing controllability, observability and stability, S - Mamba not only improves the performance of the model but also simplifies the system structure, making it more suitable for tasks with long - sequence dependencies.