Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation

Zheng Liu,Chenyuan Wu,Ninglu Shao,Shitao Xiao,Chaozhuo Li,Defu Lian
2024-09-24
Abstract:The existing Retrieval-Augmented Generation (RAG) systems face significant challenges in terms of cost and effectiveness. On one hand, they need to encode the lengthy retrieved contexts before responding to the input tasks, which imposes substantial computational overhead. On the other hand, directly using generic Large Language Models (LLMs) often leads to sub-optimal answers, while task-specific fine-tuning may compromise the LLMs' general capabilities. To address these challenges, we introduce a novel approach called FlexRAG (Flexible Context Adaptation for RAG). In this approach, the retrieved contexts are compressed into compact embeddings before being encoded by the LLMs. Simultaneously, these compressed embeddings are optimized to enhance downstream RAG performance. A key feature of FlexRAG is its flexibility, which enables effective support for diverse compression ratios and selective preservation of important contexts. Thanks to these technical designs, FlexRAG achieves superior generation quality while significantly reducing running costs. Comprehensive experiments on various question-answering datasets validate our approach as a cost-effective and flexible solution for RAG systems.
Computation and Language
What problem does this paper attempt to address?
This paper attempts to address the challenges faced by existing Retrieval - Augmented Generation (RAG) systems in terms of cost and effectiveness. Specifically, the main problems include: 1. **High computational cost**: Existing RAG systems need to encode the retrieved long texts, which brings a huge computational burden. For example, in handling multi - hop question - answering tasks, a series of related documents may need to be processed; and in general language modeling tasks, it may be necessary to iteratively retrieve diverse knowledge sources. 2. **Poor answer quality**: Directly using general large - scale language models (LLMs) usually leads to sub - optimal answer quality, especially in complex and noisy contexts. Although fine - tuning for specific tasks can improve the answer quality, this may damage the performance of LLMs on other general tasks. To solve these problems, the paper proposes a new method - FlexRAG (Flexible Context Adaptation for RAG). FlexRAG improves the existing RAG systems in the following ways: - **Compress the retrieved context**: Compress the retrieved context into a compact embedding representation, thereby significantly reducing the computational cost. - **Flexible compression ratio**: Support any compression ratio and selectively retain key information according to the importance of the context. - **Optimize performance**: Design a two - stage training process to ensure good cooperation between the compression module and the downstream LLM, while keeping the original parameters of the LLM unchanged to avoid performance degradation. In summary, FlexRAG aims to achieve a more efficient, more flexible and more cost - effective RAG system, thereby improving the generation quality and reducing the cost.