E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

Zihan Liao,Jun Wang,Hang Yu,Lingxiao Wei,Jianguo Li,Jun Wang,Wei Zhang

2024-09-11

Abstract:In the realm of Large Language Models (LLMs), the ability to process long contexts is increasingly crucial for tasks such as multi-round dialogues, code generation, and document summarization. This paper addresses the challenges of enhancing the long-context performance, reducing computational complexity, and leveraging pretrained models collectively termed the "impossible triangle." We introduce E2LLM (Encoder Elongated Large Language Models), a novel approach that effectively navigates this paradox. The method involves splitting long contexts into chunks, compressing each into embedding vectors via a pretrained text encoder, and utilizing an adapter to align these representations with a decoder-only LLM. Two training objectives, focusing on reconstruction of the encoder output and long-context instruction fine-tuning, are employed to facilitate the understanding of soft prompts by the LLM. Experimental results demonstrate that E2LLM achieves superior performance in long-context scenarios while balancing efficiency, performance, and compatibility with pretrained models. Our framework thus represents a significant advancement in the field, contributing to effective long-text modeling.

Computation and Language

What problem does this paper attempt to address?

The paper attempts to address the balance between handling long contexts in large language models (LLMs), reducing computational complexity, and leveraging pre-trained models, known as the "impossible trinity." Specifically, the paper aims to improve LLMs' long-context understanding and reasoning abilities in tasks such as multi-turn dialogue, code generation, and document summarization, while reducing training and inference computational costs and ensuring compatibility with existing pre-trained models. To tackle this challenge, the paper proposes E2LLM (Encoder Elongated Large Language Models), a novel approach based on pre-trained text encoders and decoder-type LLMs. E2LLM works by splitting long texts into multiple chunks, using a pre-trained text encoder to compress each chunk into embedding vectors, and then aligning these embedding vectors to the input space of decoder-type LLMs through adapters, thereby enabling effective handling of long contexts. Additionally, E2LLM introduces two training objectives: one is to reconstruct the encoder outputs, and the other is long-context instruction fine-tuning to enhance LLMs' understanding of soft prompts, thereby generating accurate answers. Experimental results show that E2LLM performs excellently in long-context scenarios, achieving a good balance between performance, efficiency, and compatibility, and bringing significant advancements to the field of long-text modeling.

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

Long-Context Language Modeling with Parallel Context Encoding

Long-context LLMs Struggle with Long In-context Learning

XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference

LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking

Large Language Models Can Self-Improve in Long-context Reasoning

LongVLM: Efficient Long Video Understanding via Large Language Models

LooGLE: Can Long-Context Language Models Understand Long Contexts?

Why Does the Effective Context Length of LLMs Fall Short?

CLEX: Continuous Length Extrapolation for Large Language Models

Empower Your Model with Longer and Better Context Comprehension

CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory

Training-Free Long-Context Scaling of Large Language Models

EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models

Extensible Embedding: A Flexible Multipler For LLM's Context Length

FltLM: An Intergrated Long-Context Large Language Model for Effective Context Filtering and Understanding

LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models

Visual Context Window Extension: A New Perspective for Long Video Understanding