Abstract:Retrieval Augmented Generation (RAG) has been a powerful tool for Large Language Models (LLMs) to efficiently process overly lengthy contexts. However, recent LLMs like Gemini-1.5 and GPT-4 show exceptional capabilities to understand long contexts directly. We conduct a comprehensive comparison between RAG and long-context (LC) LLMs, aiming to leverage the strengths of both. We benchmark RAG and LC across various public datasets using three latest LLMs. Results reveal that when resourced sufficiently, LC consistently outperforms RAG in terms of average performance. However, RAG's significantly lower cost remains a distinct advantage. Based on this observation, we propose Self-Route, a simple yet effective method that routes queries to RAG or LC based on model self-reflection. Self-Route significantly reduces the computation cost while maintaining a comparable performance to LC. Our findings provide a guideline for long-context applications of LLMs using RAG and LC.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to compare and evaluate two different long - text processing methods: Retrieval Augmented Generation (RAG) and Long - Context LLMs (LC). Specifically, the paper aims to: 1. **Systematically compare** the performance and efficiency of RAG and LC, especially their performance when processing long texts. 2. **Evaluate performance on different datasets**: Conduct benchmark tests on multiple public datasets to comprehensively understand the advantages and disadvantages of RAG and LC. 3. **Combine the advantages of both**: Propose a new method - SELF - ROUTE, which can dynamically select to use RAG or LC according to the model's self - reflection ability, thereby significantly reducing the computational cost while maintaining high performance. ### Main Findings - **Performance Advantage of LC**: When resources are sufficient, LC shows a higher average performance than RAG in most cases. - **Cost Advantage of RAG**: Although its performance is slightly inferior, the computational cost of RAG is significantly lower than that of LC. - **Effectiveness of SELF - ROUTE**: By dynamically routing queries, SELF - ROUTE can significantly reduce the cost while maintaining performance close to that of LC. ### Research Background - **RAG**: It enhances the generation ability of the language model by retrieving relevant information fragments and is suitable for tasks that require external knowledge. - **LC**: In recent years, large - language models (such as Gemini - 1.5 and GPT - 4) have performed excellently in directly understanding long contexts and support processing longer texts. ### Methods - **Datasets and Metrics**: Use the LongBench and ∞Bench datasets, covering multiple task types, including question answering, multiple - choice questions, and summary generation. - **Models and Retrievers**: Evaluate three of the latest LLMs (Gemini - 1.5 - Pro, GPT - 4O, GPT - 3.5 - Turbo) and use two retrievers, Contriever and Dragon. - **Benchmark Test Results**: LC performs better than RAG on most tasks, but RAG still has an advantage in some specific tasks. ### Adaptive Routing Method (SELF - ROUTE) - **Motivation**: Although the performance of RAG is not as good as that of LC, the prediction results of the two are highly consistent on many queries. - **Method**: Judge whether the query can be answered through model self - reflection. If it can be answered, use RAG; otherwise, use LC. - **Result**: SELF - ROUTE significantly reduces the computational cost while maintaining performance comparable to that of LC. ### Conclusion Through systematic comparison and experiments, the paper shows the superior performance of LC in long - text processing, but RAG still has important value due to its lower cost. The proposed SELF - ROUTE method effectively combines the advantages of both and provides a new solution for practical applications.

Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG

Long Context RAG Performance of Large Language Models

LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering

LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

Enhancing Long Context Performance in LLMs Through Inner Loop Query Mechanism

Sufficient Context: A New Lens on Retrieval Augmented Generation Systems

Retrieval meets Long Context Large Language Models

In Defense of RAG in the Era of Long-Context Language Models

Retrieval-Augmented Generation for Large Language Models: A Survey

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

SFR-RAG: Towards Contextually Faithful LLMs

Bridging the Preference Gap between Retrievers and LLMs

Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models

M-RAG: Reinforcing Large Language Model Performance through Retrieval-Augmented Generation with Multiple Partitions

On the Role of Long-tail Knowledge in Retrieval Augmented Large Language Models

Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely

Long^2RAG: Evaluating Long-Context Long-Form Retrieval-Augmented Generation with Key Point Recall