Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models

Xindi Wang,Mahsa Salmani,Parsa Omidi,Xiangyu Ren,Mehdi Rezagholizadeh,Armaghan Eshaghi
2024-05-29
Abstract:Recently, large language models (LLMs) have shown remarkable capabilities including understanding context, engaging in logical reasoning, and generating responses. However, this is achieved at the expense of stringent computational and memory requirements, hindering their ability to effectively support long input sequences. This survey provides an inclusive review of the recent techniques and methods devised to extend the sequence length in LLMs, thereby enhancing their capacity for long-context understanding. In particular, we review and categorize a wide range of techniques including architectural modifications, such as modified positional encoding and altered attention mechanisms, which are designed to enhance the processing of longer sequences while avoiding a proportional increase in computational requirements. The diverse methodologies investigated in this study can be leveraged across different phases of LLMs, i.e., training, fine-tuning and inference. This enables LLMs to efficiently process extended sequences. The limitations of the current methodologies is discussed in the last section along with the suggestions for future research directions, underscoring the importance of sequence length in the continued advancement of LLMs.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
This paper provides a comprehensive overview of techniques for extending the ability of large language models (LLMs) to handle long sequences. Currently, although LLMs perform well in understanding context, logical reasoning, and generating responses, they require significant computational and memory resources, limiting their effectiveness in supporting long input sequences. The paper reviews various techniques, including architectural modifications such as modified position encoding and attention mechanisms, to enhance the ability to handle long sequences while avoiding linear growth in computational requirements. These methods can be applied to the training, fine-tuning, and inference stages of LLMs to improve the efficiency of handling extended sequences. The paper introduces several specific techniques, such as positional interpolation and extrapolation, for extending the model's ability to handle longer sequences than those seen during training. Other techniques discussed include segmenting and sliding the context window to handle long inputs in smaller segments or by moving the context window. Prompt compression is also explored as an effective way to compress input prompts while retaining key information. Furthermore, attention approximation techniques, including low-rank decomposition, sparse patterns, and attention-free transformers, are discussed, as well as model compression techniques such as quantization and pruning, to reduce computational and memory requirements. The paper also discusses the limitations of these methods and proposes future research directions, emphasizing the importance of sequence length in the continued development of LLMs. Overall, this survey provides a comprehensive perspective for understanding and improving the ability of LLMs to handle long sequences and highlights potential research pathways for the future.