RAGGED: Towards Informed Design of Retrieval Augmented Generation Systems

Jennifer Hsia,Afreen Shaikh,Zhiruo Wang,Graham Neubig
2024-08-13
Abstract:Retrieval-augmented generation (RAG) can significantly improve the performance of language models (LMs) by providing additional context for tasks such as document-based question answering (DBQA). However, the effectiveness of RAG is highly dependent on its configuration. To systematically find the optimal configuration, we introduce RAGGED, a framework for analyzing RAG configurations across various DBQA tasks. Using the framework, we discover distinct LM behaviors in response to varying context quantities, context qualities, and retrievers. For instance, while some models are robust to noisy contexts, monotonically performing better with more contexts, others are more noise-sensitive and can effectively use only a few contexts before declining in performance. This framework also provides a deeper analysis of these differences by evaluating the LMs' sensitivity to signal and noise under specific context quality conditions. Using RAGGED, researchers and practitioners can derive actionable insights about how to optimally configure their RAG systems for their specific question-answering tasks.
Computation and Language
What problem does this paper attempt to address?
The paper primarily explores the configuration optimization of Retrieval-Augmented Generation (RAG) systems in Document-Based Question Answering (DBQA) tasks. Specifically, the research team designed a framework named RAGGED to analyze the performance of RAG systems under different configurations. RAG systems enhance the performance of language models in knowledge-intensive generation tasks (such as document-based question answering) by retrieving relevant passages from a large number of documents as additional context. However, effectively configuring these systems to achieve optimal results is not an intuitive process. For instance, different language models have varying limitations on context length, and existing literature provides conflicting recommendations on how many retrieved passages should be provided and how the quality of these passages affects the final performance. To systematically find the optimal configuration, the researchers proposed the RAGGED framework. This framework explores the performance of RAG systems through analysis in the following three aspects: 1. **Effective Number of Context Passages**: Investigating how different model architectures respond to changes in the number of context passages. It was found that the performance of some models increases monotonically with the number of contexts, while the performance of others peaks at a certain point and then starts to decline. 2. **Context Utilization Behaviors**: Analyzing the performance of reader models under different context quality conditions, particularly how models distinguish and utilize relevant information in the presence of sufficient information (signal) and irrelevant information (noise). 3. **Influence of Retriever Choice**: Examining how the choice of retriever affects the performance of reader models, especially on datasets from different domains (such as Wikipedia or PubMed) and when facing questions of varying complexity (single-hop or multi-hop questions). Through the RAGGED framework, researchers can gain a deep understanding of the performance of different RAG component combinations under specific conditions, thereby providing practitioners with concrete guidance on how to optimize the configuration of RAG systems. Additionally, the paper details the experimental setup, datasets used, and evaluation metrics to ensure the validity and reproducibility of the analysis results.