In Defense of RAG in the Era of Long-Context Language Models

Tan Yu,Anbang Xu,Rama Akkiraju

2024-09-03

Abstract:Overcoming the limited context limitations in early-generation LLMs, retrieval-augmented generation (RAG) has been a reliable solution for context-based answer generation in the past. Recently, the emergence of long-context LLMs allows the models to incorporate much longer text sequences, making RAG less attractive. Recent studies show that long-context LLMs significantly outperform RAG in long-context applications. Unlike the existing works favoring the long-context LLM over RAG, we argue that the extremely long context in LLMs suffers from a diminished focus on relevant information and leads to potential degradation in answer quality. This paper revisits the RAG in long-context answer generation. We propose an order-preserve retrieval-augmented generation (OP-RAG) mechanism, which significantly improves the performance of RAG for long-context question-answer applications. With OP-RAG, as the number of retrieved chunks increases, the answer quality initially rises, and then declines, forming an inverted U-shaped curve. There exist sweet points where OP-RAG could achieve higher answer quality with much less tokens than long-context LLM taking the whole context as input. Extensive experiments on public benchmark demonstrate the superiority of our OP-RAG.

Computation and Language

What problem does this paper attempt to address?

The paper primarily explores the effectiveness of Retrieval-Augmented Generation (RAG) in the era of Long-Context Language Models (LLMs). The authors observe that the order of retrieved segments is crucial for the quality of answers in long documents. Unlike traditional RAG methods that arrange retrieved segments in descending order of relevance, this paper proposes a mechanism to preserve the original order of segments—Order-Preserve Retrieval-Augmented Generation (OP-RAG). Experimental results show that this mechanism significantly enhances the performance of RAG in long-context question-answering tasks. Specifically, as the number of retrieved segments increases, the answer quality first improves and then declines, forming an inverted U-shaped curve. This is because more segments provide more relevant information, which helps improve answer quality; however, too many segments may introduce irrelevant or distracting information, thereby reducing the model's performance. Therefore, there is an optimal point where the balance between relevant and irrelevant information results in the highest answer quality. Additionally, the paper compares OP-RAG with methods that rely solely on long-context LLMs through experiments. The results show that OP-RAG can still achieve higher F1 scores even with significantly reduced input lengths. This indicates that effective retrieval and focused context utilization can outperform traditional methods that directly handle extremely long contexts.

In Defense of RAG in the Era of Long-Context Language Models

Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG

LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering

LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

Long Context RAG Performance of Large Language Models

Compressing Long Context for Enhancing RAG with AMR-based Concept Distillation

Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

SFR-RAG: Towards Contextually Faithful LLMs

Enhancing Long Context Performance in LLMs Through Inner Loop Query Mechanism

Long^2RAG: Evaluating Long-Context Long-Form Retrieval-Augmented Generation with Key Point Recall

RAGGED: Towards Informed Design of Retrieval Augmented Generation Systems

Long$^2$RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall

Context Embeddings for Efficient Answer Generation in RAG

Sufficient Context: A New Lens on Retrieval Augmented Generation Systems

RAG based Question-Answering for Contextual Response Prediction System

Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

Better RAG using Relevant Information Gain

Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation

RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation