In Defense of RAG in the Era of Long-Context Language Models

Tan Yu,Anbang Xu,Rama Akkiraju
2024-09-03
Abstract:Overcoming the limited context limitations in early-generation LLMs, retrieval-augmented generation (RAG) has been a reliable solution for context-based answer generation in the past. Recently, the emergence of long-context LLMs allows the models to incorporate much longer text sequences, making RAG less attractive. Recent studies show that long-context LLMs significantly outperform RAG in long-context applications. Unlike the existing works favoring the long-context LLM over RAG, we argue that the extremely long context in LLMs suffers from a diminished focus on relevant information and leads to potential degradation in answer quality. This paper revisits the RAG in long-context answer generation. We propose an order-preserve retrieval-augmented generation (OP-RAG) mechanism, which significantly improves the performance of RAG for long-context question-answer applications. With OP-RAG, as the number of retrieved chunks increases, the answer quality initially rises, and then declines, forming an inverted U-shaped curve. There exist sweet points where OP-RAG could achieve higher answer quality with much less tokens than long-context LLM taking the whole context as input. Extensive experiments on public benchmark demonstrate the superiority of our OP-RAG.
Computation and Language
What problem does this paper attempt to address?
The paper primarily explores the effectiveness of Retrieval-Augmented Generation (RAG) in the era of Long-Context Language Models (LLMs). The authors observe that the order of retrieved segments is crucial for the quality of answers in long documents. Unlike traditional RAG methods that arrange retrieved segments in descending order of relevance, this paper proposes a mechanism to preserve the original order of segments—Order-Preserve Retrieval-Augmented Generation (OP-RAG). Experimental results show that this mechanism significantly enhances the performance of RAG in long-context question-answering tasks. Specifically, as the number of retrieved segments increases, the answer quality first improves and then declines, forming an inverted U-shaped curve. This is because more segments provide more relevant information, which helps improve answer quality; however, too many segments may introduce irrelevant or distracting information, thereby reducing the model's performance. Therefore, there is an optimal point where the balance between relevant and irrelevant information results in the highest answer quality. Additionally, the paper compares OP-RAG with methods that rely solely on long-context LLMs through experiments. The results show that OP-RAG can still achieve higher F1 scores even with significantly reduced input lengths. This indicates that effective retrieval and focused context utilization can outperform traditional methods that directly handle extremely long contexts.