Parallel Context Windows for Large Language Models

Nir Ratner,Yoav Levine,Yonatan Belinkov,Ori Ram,Inbal Magar,Omri Abend,Ehud Karpas,Amnon Shashua,Kevin Leyton-Brown,Yoav Shoham

2023-08-02

Abstract:When applied to processing long text, Large Language Models (LLMs) are limited by their context window. Existing efforts to address this limitation involve training specialized architectures, and cannot be easily applied to off-the-shelf LLMs. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks (``windows''), restrict the attention mechanism to apply only within each window, and re-use the positional embeddings across the windows. Our main results test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters, and show substantial improvements for tasks with diverse input and output spaces. We show additional benefits in other settings where long context windows may be beneficial: multi-hop questions and retrieval-augmented question answering with multiple retrieved documents. Our results highlight Parallel Context Windows as a promising method for applying off-the-shelf LLMs in a range of settings that require long text sequences. We make our code publicly available at <a class="link-external link-https" href="https://github.com/ai21labs/parallel-context-windows" rel="external noopener nofollow">this https URL</a>.

Computation and Language

What problem does this paper attempt to address?

This paper aims to address the issue of large language models (LLMs) being constrained by the context window size when processing long texts. Specifically, existing methods typically require training on specific architectures, making it difficult to apply them to off-the-shelf LLMs. This paper proposes a method called Parallel Context Windows (PCW), which alleviates this limitation without further training. PCW works by splitting long texts into multiple "windows," restricting the attention mechanism to operate within each window, and reusing positional embeddings across windows. This allows any off-the-shelf LLM to handle texts that exceed its original context window length. Experimental results show that PCW significantly improves model performance across various input-output space tasks and demonstrates additional advantages in scenarios such as multi-hop question answering and retrieval-augmented question answering.

Parallel Context Windows for Large Language Models

Long-Context Language Modeling with Parallel Context Encoding

Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration

Exploring Context Window of Large Language Models via Decomposed Positional Vectors

Retrieval meets Long Context Large Language Models

Naive Bayes-based Context Extension for Large Language Models

LLoCO: Learning Long Contexts Offline

Visual Context Window Extension: A New Perspective for Long Video Understanding

Extending LLMs' Context Window with 100 Samples

Training-Free Long-Context Scaling of Large Language Models

A Controlled Study on Long Context Extension and Generalization in LLMs

LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models

Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction

Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs

Enhancing Contextual Understanding in Large Language Models through Contrastive Decoding

Long-context LLMs Struggle with Long In-context Learning

Why Does the Effective Context Length of LLMs Fall Short?

Adapting LLMs for Efficient Context Processing through Soft Prompt Compression

Extending Context Window of Large Language Models via Semantic Compression