Abstract:Length generalization failure problem, namely the large language model (LLM) fails to generalize to texts longer than its maximum training length, greatly restricts the application of LLM in the scenarios with streaming long inputs. To address this problem, the existing methods either require substantial costs or introduce precision loss. In this paper, we empirically find that the accuracy of the LLM's prediction is highly correlated to its certainty. Based on this, we propose an efficient training free framework, named XL3M (it means extra-long large language model), which enables the LLMs trained on short sequences to reason extremely long sequence without any further training or fine-tuning. Under the XL3M framework, the input context will be firstly decomposed into multiple short sub-contexts, where each sub-context contains an independent segment and a common ``question'' which is a few tokens from the end of the original context. Then XL3M gives a method to measure the relevance between each segment and the ``question'', and constructs a concise key context by splicing all the relevant segments in chronological order. The key context is further used instead of the original context to complete the inference task. Evaluations on comprehensive benchmarks show the superiority of XL3M. Using our framework, a Llama2-7B model is able to reason 20M long sequences on an 8-card Huawei Ascend 910B NPU machine with 64GB memory per card.

What problem does this paper attempt to address?

The paper "XL3M: Length Expansion of Large Language Models Based on Paragraph Reasoning" mainly addresses the issue of generalization failure in large language models (LLMs) when dealing with texts that exceed their maximum training length, known as the length generalization failure problem. This limitation restricts the application of LLMs in scenarios that require long input, such as multi-turn dialogue, dialogue guidance, and document summarization tasks. Existing methods either require significant costs, such as continuous training or fine-tuning, or result in accuracy loss. In the paper, the researchers found a high correlation between the accuracy and determinism of LLM predictions. Therefore, they propose an efficient framework called XL3M, which does not require additional training, to enable LLMs trained on short sequences to understand and process extremely long sequences. XL3M decomposes the input context into multiple short sub-contexts containing independent paragraphs and a common "query", and then constructs a concise key context by measuring the relevance between each paragraph and the "query". This key context is used instead of the original context for inference tasks. This approach reduces irrelevant context and allows LLMs to generate high-quality results based on the extracted key context. The main contributions of the paper include: 1. Introducing the XL3M framework and demonstrating the high correlation between the accuracy of LLM predictions and their determinism (measured by entropy), and leveraging this principle to achieve length expansion without training. 2. Evaluating XL3M on a series of comprehensive benchmark tests and widely used "needle in a haystack" tasks, demonstrating its superior performance compared to other state-of-the-art methods (including fine-tuning and non-fine-tuning methods). 3. XL3M does not modify the basic structure of LLMs, does not require additional training or fine-tuning, and demonstrates excellent performance in terms of time and memory efficiency, capable of handling sequences of over 20M on an 8-card Huawei Ascend 910B NPU machine. The paper also reviews existing length expansion techniques, including fine-tuning-based, non-fine-tuning-based, and external memory-based methods, analyzing their limitations and effectiveness. Finally, XL3M demonstrates its effectiveness and time efficiency in handling long sequences.

XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

Why Does the Effective Context Length of LLMs Fall Short?

LLM×MapReduce: Simplified Long-Sequence Processing Using Large Language Models

Training-Free Long-Context Scaling of Large Language Models

Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Long-context LLMs Struggle with Long In-context Learning

CLEX: Continuous Length Extrapolation for Large Language Models

Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly

LLM$\times$MapReduce: Simplified Long-Sequence Processing using Large Language Models

Language Models can Self-Lengthen to Generate Long Texts

A Controlled Study on Long Context Extension and Generalization in LLMs

Make Your LLM Fully Utilize the Context

A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts

XL$^2$Bench: A Benchmark for Extremely Long Context Understanding with Long-range Dependencies

Length Controlled Generation for Black-box LLMs