Towards Multi-Source Retrieval-Augmented Generation via Synergizing Reasoning and Preference-Driven Retrieval

Qingfei Zhao,Ruobing Wang,Xin Wang,Daren Zha,Nan Mu
2024-11-01
Abstract:Retrieval-Augmented Generation (RAG) has emerged as a reliable external knowledge augmentation technique to mitigate hallucination issues and parameterized knowledge limitations in Large Language Models (LLMs). Existing Adaptive RAG (ARAG) systems struggle to effectively explore multiple retrieval sources due to their inability to select the right source at the right time. To address this, we propose a multi-source ARAG framework, termed MSPR, which synergizes reasoning and preference-driven retrieval to adaptive decide "when and what to retrieve" and "which retrieval source to use". To better adapt to retrieval sources of differing characteristics, we also employ retrieval action adjustment and answer feedback strategy. They enable our framework to fully explore the high-quality primary source while supplementing it with secondary sources at the right time. Extensive and multi-dimensional experiments conducted on three datasets demonstrate the superiority and effectiveness of MSPR.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that the existing Adaptive Retrieval - Augmented Generation (ARAG) systems are unable to effectively select appropriate information sources in multi - source retrieval, resulting in the inability to fully utilize high - quality information sources. Specifically: 1. **Challenges in multi - source retrieval**: Existing ARAG systems have difficulty effectively exploring multiple retrieval sources because they cannot select the right source at the right time. This leads to insufficient utilization of high - quality information sources and poor performance in complex tasks such as multi - hop question answering. 2. **Differences in characteristics of different information sources**: Different retrieval sources have significant differences in content quality, information scale, and knowledge density. For example, a high - quality but small - scale local corpus and a large - scale but lower - quality web browser. Directly introducing multi - source retrieval with these different characteristics may lead to blind selection of retrieval sources, hindering the priority and in - depth exploration of high - quality information sources. 3. **Limitations in the adaptive information collection process**: Even when basic retrieval preference descriptions are provided, existing methods are still difficult to select the optimal retrieval source and the appropriate timing during the adaptive process. To solve these problems, the paper proposes a multi - source ARAG framework (MSPR), which restricts the adaptive knowledge collection process by combining inference and preference - driven retrieval strategies, thereby achieving more effective multi - source retrieval and knowledge enhancement.