Recurrent Context Compression: Efficiently Expanding the Context Window of LLM

Chensen Huang,Guibo Zhu,Xuepeng Wang,Yifei Luo,Guojing Ge,Haoran Chen,Dong Yi,Jinqiao Wang
2024-06-10
Abstract:To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity. This work introduces a method called Recurrent Context Compression (RCC), designed to efficiently expand the context window length of LLMs within constrained storage space. We also investigate the issue of poor model responses when both instructions and context are compressed in downstream tasks, and propose an instruction reconstruction method to mitigate this problem. We validated the effectiveness of our approach on multiple tasks, achieving a compression rate of up to 32x on text reconstruction tasks with a BLEU4 score close to 0.95, and nearly 100\% accuracy on a passkey retrieval task with a sequence length of 1M. Finally, our method demonstrated competitive performance in long-text question-answering tasks compared to non-compressed methods, while significantly saving storage resources in long-text inference tasks. Our code, models, and demo are available at <a class="link-external link-https" href="https://github.com/WUHU-G/RCC_Transformer" rel="external noopener nofollow">this https URL</a>
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper focuses on how to effectively expand the context window length of large-scale language models (LLMs) based on Transformer while improving comprehension ability within limited storage space. Existing methods are limited by computing resources and memory storage capacity. To address this, the paper proposes a method called Recurrent Context Compression (RCC) to efficiently expand the context window of LLMs within the constrained storage space. The RCC model compresses the context through an autoencoder structure to reduce information loss and improve compression efficiency. Experiments show that this method achieves a compression ratio of up to 32x in text reconstruction tasks, with a BLEU4 score close to 0.95, and achieves nearly 100% accuracy in key retrieval tasks with 1M sequence length. Furthermore, RCC demonstrates competitive performance in long text question-answering tasks compared to non-compression methods, while significantly saving storage resources for long text inference tasks. The paper also proposes a new training method to adapt the long text context compression language model by overcoming the context window limitation of the encoder through a recurrent compression mechanism. In downstream tasks, when both instructions and context are compressed, the model often fails to follow the instructions correctly, leading to a degradation in response quality. To address this issue, the paper proposes a method that utilizes the text reconstruction capability of the context compression language model to reconstruct the content of instructions, thereby significantly improving output quality when both are compressed. In summary, the paper addresses the efficiency, scalability, and instruction confusion issues of existing context compression methods in handling long text processing, providing improvements for LLMs with long text inputs.