E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

Zihan Liao,Jun Wang,Hang Yu,Lingxiao Wei,Jianguo Li,Jun Wang,Wei Zhang
2024-09-11
Abstract:In the realm of Large Language Models (LLMs), the ability to process long contexts is increasingly crucial for tasks such as multi-round dialogues, code generation, and document summarization. This paper addresses the challenges of enhancing the long-context performance, reducing computational complexity, and leveraging pretrained models collectively termed the "impossible triangle." We introduce E2LLM (Encoder Elongated Large Language Models), a novel approach that effectively navigates this paradox. The method involves splitting long contexts into chunks, compressing each into embedding vectors via a pretrained text encoder, and utilizing an adapter to align these representations with a decoder-only LLM. Two training objectives, focusing on reconstruction of the encoder output and long-context instruction fine-tuning, are employed to facilitate the understanding of soft prompts by the LLM. Experimental results demonstrate that E2LLM achieves superior performance in long-context scenarios while balancing efficiency, performance, and compatibility with pretrained models. Our framework thus represents a significant advancement in the field, contributing to effective long-text modeling.
Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the balance between handling long contexts in large language models (LLMs), reducing computational complexity, and leveraging pre-trained models, known as the "impossible trinity." Specifically, the paper aims to improve LLMs' long-context understanding and reasoning abilities in tasks such as multi-turn dialogue, code generation, and document summarization, while reducing training and inference computational costs and ensuring compatibility with existing pre-trained models. To tackle this challenge, the paper proposes E2LLM (Encoder Elongated Large Language Models), a novel approach based on pre-trained text encoders and decoder-type LLMs. E2LLM works by splitting long texts into multiple chunks, using a pre-trained text encoder to compress each chunk into embedding vectors, and then aligning these embedding vectors to the input space of decoder-type LLMs through adapters, thereby enabling effective handling of long contexts. Additionally, E2LLM introduces two training objectives: one is to reconstruct the encoder outputs, and the other is long-context instruction fine-tuning to enhance LLMs' understanding of soft prompts, thereby generating accurate answers. Experimental results show that E2LLM performs excellently in long-context scenarios, achieving a good balance between performance, efficiency, and compatibility, and bringing significant advancements to the field of long-text modeling.