IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues

Diji Yang,Jinmeng Rao,Kezhen Chen,Xiaoyuan Guo,Yawen Zhang,Jie Yang,Yi Zhang
2024-05-15
Abstract:Although the Retrieval-Augmented Generation (RAG) paradigms can use external knowledge to enhance and ground the outputs of Large Language Models (LLMs) to mitigate generative hallucinations and static knowledge base problems, they still suffer from limited flexibility in adopting Information Retrieval (IR) systems with varying capabilities, constrained interpretability during the multi-round retrieval process, and a lack of end-to-end optimization. To address these challenges, we propose a novel LLM-centric approach, IM-RAG, that integrates IR systems with LLMs to support multi-round RAG through learning Inner Monologues (IM, i.e., the human inner voice that narrates one's thoughts). During the IM process, the LLM serves as the core reasoning model (i.e., Reasoner) to either propose queries to collect more information via the Retriever or to provide a final answer based on the conversational context. We also introduce a Refiner that improves the outputs from the Retriever, effectively bridging the gap between the Reasoner and IR modules with varying capabilities and fostering multi-round communications. The entire IM process is optimized via Reinforcement Learning (RL) where a Progress Tracker is incorporated to provide mid-step rewards, and the answer prediction is further separately optimized via Supervised Fine-Tuning (SFT). We conduct extensive experiments with the HotPotQA dataset, a popular benchmark for retrieval-based, multi-step question-answering. The results show that our approach achieves state-of-the-art (SOTA) performance while providing high flexibility in integrating IR modules as well as strong interpretability exhibited in the learned inner monologues.
Computation and Language,Artificial Intelligence,Information Retrieval
What problem does this paper attempt to address?
This paper proposes a solution to the limitations of large language models (LLMs) in generative hallucination and static knowledge problems. Although current methods can improve the accuracy and reliability of LLMs' outputs through information retrieval (IR) systems, they lack flexibility, interpretability in multi-round retrieval processes, and end-to-end optimization. Therefore, the paper introduces a new approach called IM-RAG, which supports multi-round retrieval-enhanced generation by learning "inner monologue" (IM). The core of IM-RAG is a reasoning model (Reasoner) based on LLMs, which can generate queries as needed to obtain more information or provide the final answer based on the dialogue context. In addition, a Refiner component is introduced to improve the output of the retriever, bridging the gap between different capability reasoning modules and IR modules, facilitating multi-round interaction. The entire IM process is optimized through reinforcement learning (RL) and utilizes a progress tracker to provide intermediate step rewards. Finally, answer prediction is further optimized through supervised fine-tuning (SFT). Experiments are conducted on the HotPotQA dataset, which is used to evaluate retrieval-based multi-step question-answering tasks. The results show that the IM-RAG method achieves state-of-the-art performance while providing high flexibility to integrate IR modules with different capabilities, and demonstrates strong interpretability in the learned inner monologue.