Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

Zhuowan Li,Cheng Li,Mingyang Zhang,Qiaozhu Mei,Michael Bendersky
2024-10-18
Abstract:Retrieval Augmented Generation (RAG) has been a powerful tool for Large Language Models (LLMs) to efficiently process overly lengthy contexts. However, recent LLMs like Gemini-1.5 and GPT-4 show exceptional capabilities to understand long contexts directly. We conduct a comprehensive comparison between RAG and long-context (LC) LLMs, aiming to leverage the strengths of both. We benchmark RAG and LC across various public datasets using three latest LLMs. Results reveal that when resourced sufficiently, LC consistently outperforms RAG in terms of average performance. However, RAG's significantly lower cost remains a distinct advantage. Based on this observation, we propose Self-Route, a simple yet effective method that routes queries to RAG or LC based on model self-reflection. Self-Route significantly reduces the computation cost while maintaining a comparable performance to LC. Our findings provide a guideline for long-context applications of LLMs using RAG and LC.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to compare and evaluate two different long - text processing methods: Retrieval Augmented Generation (RAG) and Long - Context LLMs (LC). Specifically, the paper aims to: 1. **Systematically compare** the performance and efficiency of RAG and LC, especially their performance when processing long texts. 2. **Evaluate performance on different datasets**: Conduct benchmark tests on multiple public datasets to comprehensively understand the advantages and disadvantages of RAG and LC. 3. **Combine the advantages of both**: Propose a new method - SELF - ROUTE, which can dynamically select to use RAG or LC according to the model's self - reflection ability, thereby significantly reducing the computational cost while maintaining high performance. ### Main Findings - **Performance Advantage of LC**: When resources are sufficient, LC shows a higher average performance than RAG in most cases. - **Cost Advantage of RAG**: Although its performance is slightly inferior, the computational cost of RAG is significantly lower than that of LC. - **Effectiveness of SELF - ROUTE**: By dynamically routing queries, SELF - ROUTE can significantly reduce the cost while maintaining performance close to that of LC. ### Research Background - **RAG**: It enhances the generation ability of the language model by retrieving relevant information fragments and is suitable for tasks that require external knowledge. - **LC**: In recent years, large - language models (such as Gemini - 1.5 and GPT - 4) have performed excellently in directly understanding long contexts and support processing longer texts. ### Methods - **Datasets and Metrics**: Use the LongBench and ∞Bench datasets, covering multiple task types, including question answering, multiple - choice questions, and summary generation. - **Models and Retrievers**: Evaluate three of the latest LLMs (Gemini - 1.5 - Pro, GPT - 4O, GPT - 3.5 - Turbo) and use two retrievers, Contriever and Dragon. - **Benchmark Test Results**: LC performs better than RAG on most tasks, but RAG still has an advantage in some specific tasks. ### Adaptive Routing Method (SELF - ROUTE) - **Motivation**: Although the performance of RAG is not as good as that of LC, the prediction results of the two are highly consistent on many queries. - **Method**: Judge whether the query can be answered through model self - reflection. If it can be answered, use RAG; otherwise, use LC. - **Result**: SELF - ROUTE significantly reduces the computational cost while maintaining performance comparable to that of LC. ### Conclusion Through systematic comparison and experiments, the paper shows the superior performance of LC in long - text processing, but RAG still has important value due to its lower cost. The proposed SELF - ROUTE method effectively combines the advantages of both and provides a new solution for practical applications.