Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA

Nirmal Roy,Leonardo F. R. Ribeiro,Rexhina Blloshmi,Kevin Small
2024-09-24
Abstract:Augmenting Large Language Models (LLMs) with information retrieval capabilities (i.e., Retrieval-Augmented Generation (RAG)) has proven beneficial for knowledge-intensive tasks. However, understanding users' contextual search intent when generating responses is an understudied topic for conversational question answering (QA). This conversational extension leads to additional concerns when compared to single-turn QA as it is more challenging for systems to comprehend conversational context and manage retrieved passages over multiple turns. In this work, we propose a method for enabling LLMs to decide when to retrieve in RAG settings given a conversational context. When retrieval is deemed necessary, the LLM then rewrites the conversation for passage retrieval and judges the relevance of returned passages before response generation. Operationally, we build on the single-turn SELF-RAG framework (Asai et al., 2023) and propose SELF-multi-RAG for conversational settings. SELF-multi-RAG demonstrates improved capabilities over single-turn variants with respect to retrieving relevant passages (by using summarized conversational context) and assessing the quality of generated responses. Experiments on three conversational QA datasets validate the enhanced response generation capabilities of SELF-multi-RAG, with improvements of ~13% measured by human annotation.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the problem of how to enable large language models (LLMs) to decide whether to retrieve relevant information given the conversational context in multi-turn conversational question answering (CQA), and, if necessary, rewrite the conversation to retrieve relevant documents, evaluate the relevance of the returned documents, and generate high-quality answers. Specifically, the paper proposes the SELF-multi-RAG framework, which aims to: 1. **Understand the conversational context**: In multi-turn conversations, the system needs not only to understand the current question but also to consider the previous conversation history to determine whether new information needs to be retrieved or if the previously retrieved information can be directly used to generate an answer. 2. **Summarize the conversational context**: To improve retrieval effectiveness, this method trains the model to summarize the conversation history into a query, thereby more effectively retrieving relevant documents and avoiding the information loss problem that traditional single-question rewriting methods might cause. Through the above methods, SELF-multi-RAG can more accurately decide when to retrieve, how to rewrite queries, and how to generate answers based on retrieval results in multi-turn conversation scenarios, thereby improving the overall quality of answers. Experimental results show that this method has significant improvements over single-turn conversation methods on multiple CQA datasets.