Reducing Distraction in Long-Context Language Models by Focused Learning

Zijun Wu,Bingyuan Liu,Ran Yan,Lei Chen,Thomas Delteil

2024-11-09

Abstract:Recent advancements in Large Language Models (LLMs) have significantly enhanced their capacity to process long contexts. However, effectively utilizing this long context remains a challenge due to the issue of distraction, where irrelevant information dominates lengthy contexts, causing LLMs to lose focus on the most relevant segments. To address this, we propose a novel training method that enhances LLMs' ability to discern relevant information through a unique combination of retrieval-based data augmentation and contrastive learning. Specifically, during fine-tuning with long contexts, we employ a retriever to extract the most relevant segments, serving as augmented inputs. We then introduce an auxiliary contrastive learning objective to explicitly ensure that outputs from the original context and the retrieved sub-context are closely aligned. Extensive experiments on long single-document and multi-document QA benchmarks demonstrate the effectiveness of our proposed method.

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in long - context language models, due to the presence of irrelevant information, the model is easily distracted when processing long texts and cannot effectively utilize relevant information in the long context. Specifically, when large - language models (LLMs) process long texts containing a large amount of irrelevant information, they may lose focus on the most relevant parts, thus affecting task performance, especially in long - document question - answering tasks, where relevant answers may be buried in a large amount of text. This problem is known as the "distraction problem". To meet this challenge, the author proposes a new training method to improve the ability of LLMs to identify relevant information by combining retrieval - based data augmentation and contrastive learning. Specific methods include: 1. **Retrieval - based data augmentation**: During the fine - tuning process, use a retriever to extract the paragraphs most relevant to the question as augmented input, and at the same time replace irrelevant information with special markers. 2. **Contrastive learning**: Introduce an auxiliary contrastive learning objective to ensure that the outputs of the original context and the retrieved sub - context are closely aligned, thereby guiding the model to pay more attention to the most relevant content in the long - input context. This method aims to eliminate the dependence on external retrievers during inference and effectively solve the distraction problem in long contexts. Experimental results show that this method performs well in multiple long - single - document and multi - document question - answering benchmarks and significantly reduces errors caused by distraction.

Reducing Distraction in Long-Context Language Models by Focused Learning

FocusLLM: Precise Understanding of Long Context by Dynamic Condensing

FltLM: An Intergrated Long-Context Large Language Model for Effective Context Filtering and Understanding

Core Context Aware Attention for Long Context Language Modeling

Recycled Attention: Efficient inference for long-context language models

Focused Large Language Models are Stable Many-Shot Learners

Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

A Controlled Study on Long Context Extension and Generalization in LLMs

Training-Free Long-Context Scaling of Large Language Models

Retrieval meets Long Context Large Language Models

Empower Your Model with Longer and Better Context Comprehension

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Large Language Models Know What Makes Exemplary Contexts

ReAttention: Training-Free Infinite Context with Finite Attention Scope

How to Train Long-Context Language Models (Effectively)

Focused Transformer: Contrastive Training for Context Scaling

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models

Large Language Models Can Self-Improve in Long-context Reasoning

LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts

Enhancing Large Language Models' Situated Faithfulness to External Contexts