Abstract:Recently, large language models (LLMs) have shown remarkable capabilities including understanding context, engaging in logical reasoning, and generating responses. However, this is achieved at the expense of stringent computational and memory requirements, hindering their ability to effectively support long input sequences. This survey provides an inclusive review of the recent techniques and methods devised to extend the sequence length in LLMs, thereby enhancing their capacity for long-context understanding. In particular, we review and categorize a wide range of techniques including architectural modifications, such as modified positional encoding and altered attention mechanisms, which are designed to enhance the processing of longer sequences while avoiding a proportional increase in computational requirements. The diverse methodologies investigated in this study can be leveraged across different phases of LLMs, i.e., training, fine-tuning and inference. This enables LLMs to efficiently process extended sequences. The limitations of the current methodologies is discussed in the last section along with the suggestions for future research directions, underscoring the importance of sequence length in the continued advancement of LLMs.

What problem does this paper attempt to address?

This paper provides a comprehensive overview of techniques for extending the ability of large language models (LLMs) to handle long sequences. Currently, although LLMs perform well in understanding context, logical reasoning, and generating responses, they require significant computational and memory resources, limiting their effectiveness in supporting long input sequences. The paper reviews various techniques, including architectural modifications such as modified position encoding and attention mechanisms, to enhance the ability to handle long sequences while avoiding linear growth in computational requirements. These methods can be applied to the training, fine-tuning, and inference stages of LLMs to improve the efficiency of handling extended sequences. The paper introduces several specific techniques, such as positional interpolation and extrapolation, for extending the model's ability to handle longer sequences than those seen during training. Other techniques discussed include segmenting and sliding the context window to handle long inputs in smaller segments or by moving the context window. Prompt compression is also explored as an effective way to compress input prompts while retaining key information. Furthermore, attention approximation techniques, including low-rank decomposition, sparse patterns, and attention-free transformers, are discussed, as well as model compression techniques such as quantization and pruning, to reduce computational and memory requirements. The paper also discusses the limitations of these methods and proposes future research directions, emphasizing the importance of sequence length in the continued development of LLMs. Overall, this survey provides a comprehensive perspective for understanding and improving the ability of LLMs to handle long sequences and highlights potential research pathways for the future.

Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models

The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey

Long-context LLMs Struggle with Long In-context Learning

XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference

Extending Context Window of Large Language Models via Semantic Compression

Language Models can Self-Lengthen to Generate Long Texts

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

A Controlled Study on Long Context Extension and Generalization in LLMs

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly

LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models

Efficient Large Language Models: A Survey

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

Large Language Models Can Self-Improve in Long-context Reasoning

A Survey of Large Language Models

Empower Your Model with Longer and Better Context Comprehension

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

LooGLE: Can Long-Context Language Models Understand Long Contexts?

A Comprehensive Overview of Large Language Models

Why Does the Effective Context Length of LLMs Fall Short?