ReMamba: Equip Mamba with Effective Long-Sequence Modeling

Danlong Yuan,Jiahao Liu,Bei Li,Huishuai Zhang,Jingang Wang,Xunliang Cai,Dongyan Zhao
2024-09-01
Abstract:While the Mamba architecture demonstrates superior inference efficiency and competitive performance on short-context natural language processing (NLP) tasks, empirical evidence suggests its capacity to comprehend long contexts is limited compared to transformer-based models. In this study, we investigate the long-context efficiency issues of the Mamba models and propose ReMamba, which enhances Mamba's ability to comprehend long contexts. ReMamba incorporates selective compression and adaptation techniques within a two-stage re-forward process, incurring minimal additional inference costs overhead. Experimental results on the LongBench and L-Eval benchmarks demonstrate ReMamba's efficacy, improving over the baselines by 3.2 and 1.6 points, respectively, and attaining performance almost on par with same-size transformer models.
Computation and Language
What problem does this paper attempt to address?
The paper is primarily dedicated to addressing the efficiency issues of the Mamba model when processing long sequence texts. Specifically: 1. **Problem Background**: Although the Mamba architecture performs excellently in short-context natural language processing (NLP) tasks, its performance is significantly inferior to Transformer-based models when handling long sequences (e.g., over 2k tokens). 2. **Research Objective**: To improve the Mamba model's performance in long sequence tasks, the authors propose the ReMamba method. ReMamba effectively reduces information degradation through selective compression and adaptive techniques during two forward passes, thereby enhancing the model's understanding of long sequences. 3. **Core Contribution**: The ReMamba method can significantly enhance the Mamba model's ability to handle long sequences while maintaining low additional computational overhead, making its performance close to or even comparable to Transformer models of the same scale. Experimental results show that ReMamba improves the baseline by 3.2 and 1.6 points on the LongBench and L-Eval benchmarks, respectively. In summary, this paper aims to overcome the limitations of the Mamba model in processing long sequence texts by introducing the ReMamba method, thereby improving its performance in practical applications.