MemLong: Memory-Augmented Retrieval for Long Text Modeling

Weijie Liu,Zecheng Tang,Juntao Li,Kehai Chen,Min Zhang
2024-08-30
Abstract:Recent advancements in Large Language Models (LLMs) have yielded remarkable success across diverse fields. However, handling long contexts remains a significant challenge for LLMs due to the quadratic time and space complexity of attention mechanisms and the growing memory consumption of the key-value cache during generation. This work introduces MemLong: Memory-Augmented Retrieval for Long Text Generation, a method designed to enhance the capabilities of long-context language modeling by utilizing an external retriever for historical information retrieval. MemLong combines a non-differentiable ``ret-mem'' module with a partially trainable decoder-only language model and introduces a fine-grained, controllable retrieval attention mechanism that leverages semantic-level relevant chunks. Comprehensive evaluations on multiple long-context language modeling benchmarks demonstrate that MemLong consistently outperforms other state-of-the-art LLMs. More importantly, MemLong can extend the context length on a single 3090 GPU from 4k up to 80k. Our code is available at <a class="link-external link-https" href="https://github.com/Bui1dMySea/MemLong" rel="external noopener nofollow">this https URL</a>
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper "MemLong: Memory-Augmented Retrieval for Long Text Modeling" aims to address the challenges faced by large language models (LLMs) when processing long texts. Specifically, existing LLMs encounter the following issues when dealing with long contexts: 1. **High Time and Space Complexity**: Traditional attention mechanisms have quadratic time and space complexity, making it very time-consuming and memory-intensive to process long texts. 2. **High Memory Consumption for Caching**: During generation, the memory consumption of the key-value cache increases rapidly with the context length, leading to out-of-memory (OOM) issues. 3. **Limited Model Capability**: Some existing methods can reduce computational complexity but often at the cost of model performance. To address these issues, the authors propose MemLong, a method that enhances long text modeling capabilities through an external retriever. The main contributions of MemLong include: - **Distribution Consistency**: Ensuring that the distribution of information stored in memory remains consistent, avoiding distribution shifts caused by changes in model parameters. - **Training Efficiency**: By freezing the lower layers of the model and only fine-tuning the upper layers, computational costs are significantly reduced. - **Extended Context Window**: Capable of extending the context length from 4k to 80k on a single 3090 GPU, significantly improving the model's ability to handle long texts. ### Summary MemLong introduces a fine-grained, controllable retrieval attention mechanism by combining a non-differentiable retrieval module with a partially trainable decoder language model. It leverages semantically relevant fragments to enhance long text modeling capabilities. Experimental results show that MemLong performs exceptionally well on multiple long-context language modeling benchmarks, significantly outperforming other state-of-the-art LLMs.