LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Hongye Jin,Xiaotian Han,Jingfeng Yang,Zhimeng Jiang,Zirui Liu,Chia-Yuan Chang,Huiyuan Chen,Xia Hu

2024-07-11

Abstract:It is well known that LLMs cannot generalize well to long contexts whose lengths are larger than the training sequence length. This poses challenges when employing LLMs for processing long input sequences during inference. In this work, we argue that LLMs themselves have inherent capabilities to handle long contexts without fine-tuning. To achieve this goal, we propose SelfExtend to extend the context window of LLMs by constructing bi-level attention information: the grouped attention and the neighbor attention. The grouped attention captures the dependencies among tokens that are far apart, while neighbor attention captures dependencies among adjacent tokens within a specified range. The two-level attentions are computed based on the original model's self-attention mechanism during inference. With minor code modification, our SelfExtend can effortlessly extend existing LLMs' context window without any fine-tuning. We conduct comprehensive experiments on multiple benchmarks and the results show that our SelfExtend can effectively extend existing LLMs' context window length. The code can be found at \url{<a class="link-external link-https" href="https://github.com/datamllab/LongLM" rel="external noopener nofollow">this https URL</a>}.

Computation and Language,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

The problem this paper attempts to address is the performance degradation of large language models (LLMs) when handling long texts that exceed their pre-training context window length. Specifically, when the input sequence length surpasses the context window length during the pre-training phase, the behavior of LLMs becomes unpredictable, and the perplexity (PPL) significantly increases. To tackle this challenge, the authors propose a method called SelfExtend, which can extend the context window of LLMs without fine-tuning. SelfExtend achieves this by constructing a two-level attention mechanism: Grouped Attention and Neighbor Attention. Grouped Attention captures dependencies between distant tokens, while Neighbor Attention captures dependencies between adjacent tokens within a specified range. Both attention mechanisms are computed based on the original model's self-attention mechanism during inference. Through this method, SelfExtend can effectively extend the context window length of existing LLMs, thereby enhancing their ability to handle long texts. Experimental results show that SelfExtend performs well on multiple benchmarks, even surpassing existing fine-tuning methods on certain tasks. This demonstrates that LLMs inherently possess the capability to handle long contexts, which can be activated with appropriate methods.

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Extending LLMs' Context Window with 100 Samples

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

A Controlled Study on Long Context Extension and Generalization in LLMs

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

ReAttention: Training-Free Infinite Context with Finite Attention Scope

Long-context LLMs Struggle with Long In-context Learning

XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference

Visual Context Window Extension: A New Perspective for Long Video Understanding

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Make Your LLM Fully Utilize the Context

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models

LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism

Extensible Embedding: A Flexible Multipler For LLM's Context Length

A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts

Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly

LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models

LongHeads: Multi-Head Attention is Secretly a Long Context Processor

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training