LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Hongye Jin,Xiaotian Han,Jingfeng Yang,Zhimeng Jiang,Zirui Liu,Chia-Yuan Chang,Huiyuan Chen,Xia Hu
2024-07-11
Abstract:It is well known that LLMs cannot generalize well to long contexts whose lengths are larger than the training sequence length. This poses challenges when employing LLMs for processing long input sequences during inference. In this work, we argue that LLMs themselves have inherent capabilities to handle long contexts without fine-tuning. To achieve this goal, we propose SelfExtend to extend the context window of LLMs by constructing bi-level attention information: the grouped attention and the neighbor attention. The grouped attention captures the dependencies among tokens that are far apart, while neighbor attention captures dependencies among adjacent tokens within a specified range. The two-level attentions are computed based on the original model's self-attention mechanism during inference. With minor code modification, our SelfExtend can effortlessly extend existing LLMs' context window without any fine-tuning. We conduct comprehensive experiments on multiple benchmarks and the results show that our SelfExtend can effectively extend existing LLMs' context window length. The code can be found at \url{<a class="link-external link-https" href="https://github.com/datamllab/LongLM" rel="external noopener nofollow">this https URL</a>}.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem this paper attempts to address is the performance degradation of large language models (LLMs) when handling long texts that exceed their pre-training context window length. Specifically, when the input sequence length surpasses the context window length during the pre-training phase, the behavior of LLMs becomes unpredictable, and the perplexity (PPL) significantly increases. To tackle this challenge, the authors propose a method called SelfExtend, which can extend the context window of LLMs without fine-tuning. SelfExtend achieves this by constructing a two-level attention mechanism: Grouped Attention and Neighbor Attention. Grouped Attention captures dependencies between distant tokens, while Neighbor Attention captures dependencies between adjacent tokens within a specified range. Both attention mechanisms are computed based on the original model's self-attention mechanism during inference. Through this method, SelfExtend can effectively extend the context window length of existing LLMs, thereby enhancing their ability to handle long texts. Experimental results show that SelfExtend performs well on multiple benchmarks, even surpassing existing fine-tuning methods on certain tasks. This demonstrates that LLMs inherently possess the capability to handle long contexts, which can be activated with appropriate methods.