Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization

Ninglu Shao,Shitao Xiao,Zheng Liu,Peitian Zhang

2024-01-16

Abstract:Large language models (LLMs) are in need of sufficient contexts to handle many critical applications, such as retrieval augmented generation and few-shot learning. However, due to the constrained window size, the LLMs can only access to the information within a limited context. Although the size of context window can be extended by fine-tuning, it will result in a substantial cost in both training and inference stage. In this paper, we present Extensible Tokenization as an alternative method which realizes the flexible scaling of LLMs' context. Extensible Tokenization stands as a midware in between of the tokenized context and the LLM, which transforms the raw token embeddings into the extensible embeddings. Such embeddings provide a more compact representation for the long context, on top of which the LLM is able to perceive more information with the same context window. Extensible Tokenization is also featured by its flexibility: the scaling factor can be flexibly determined within a feasible scope, leading to the extension of an arbitrary context length at the inference time. Besides, Extensible Tokenization is introduced as a drop-in component, which can be seamlessly plugged into not only the LLM itself and but also its fine-tuned derivatives, bringing in the extended contextual information while fully preserving the LLM's existing capabilities. We perform comprehensive experiments on long-context language modeling and understanding tasks, which verify Extensible Tokenization as an effective, efficient, flexible, and compatible method to extend LLM's context. Our model and source code will be made publicly available.

Computation and Language

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve The paper aims to address the issue of context window limitations faced by large language models (LLMs) when handling long sequence data. Specifically: 1. **Context Window Limitations**: - Existing large language models are constrained by a fixed context window size when dealing with critical tasks such as retrieval-augmented generation and few-shot learning, which prevents them from fully covering the input data. - Although the context window can be extended through fine-tuning, this significantly increases the cost during training and inference stages and may compromise the model's original performance on shorter contexts. 2. **Limitations of Existing Methods**: - Sparse attention requires custom GPU kernels, which are not supported by standard infrastructure. - Stream processing ignores information beyond the context window, and memory compression leads to information loss and incompatibility with existing models. 3. **Proposed Method**: - The paper proposes Extensible Tokenization, a novel approach to extend the context capacity of LLMs without modifying the original model architecture. - Extensible Tokenization acts as a middleware, converting original token embeddings into compact representations called extensible embeddings, allowing the model to perceive more information within the same context window. - This method is highly flexible, strongly compatible, and can effectively enhance the performance of language modeling and understanding tasks in long contexts. Through this approach, the paper aims to provide an efficient, flexible, and compatible way to extend the context processing capabilities of large language models.

Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization

Extensible Embedding: A Flexible Multipler For LLM's Context Length

CLEX: Continuous Length Extrapolation for Large Language Models

BGE Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Exploring Context Window of Large Language Models via Decomposed Positional Vectors

LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models

Long-Context Language Modeling with Parallel Context Encoding

Extending LLMs' Context Window with 100 Samples

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

Why Does the Effective Context Length of LLMs Fall Short?

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

Empower Your Model with Longer and Better Context Comprehension

Training-Free Long-Context Scaling of Large Language Models

FltLM: An Intergrated Long-Context Large Language Model for Effective Context Filtering and Understanding

Extending Context Window of Large Language Models via Semantic Compression

LongEmbed: Extending Embedding Models for Long Context Retrieval

Extending Context Window of Large Language Models from a Distributional Perspective

Retrieval meets Long Context Large Language Models

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory