A Surprisingly Simple yet Effective Multi-Query Rewriting Method for Conversational Passage Retrieval

Ivica Kostric,Krisztian Balog
2024-06-27
Abstract:Conversational passage retrieval is challenging as it often requires the resolution of references to previous utterances and needs to deal with the complexities of natural language, such as coreference and ellipsis. To address these challenges, pre-trained sequence-to-sequence neural query rewriters are commonly used to generate a single de-contextualized query based on conversation history. Previous research shows that combining multiple query rewrites for the same user utterance has a positive effect on retrieval performance. We propose the use of a neural query rewriter to generate multiple queries and show how to integrate those queries in the passage retrieval pipeline efficiently. The main strength of our approach lies in its simplicity: it leverages how the beam search algorithm works and can produce multiple query rewrites at no additional cost. Our contributions further include devising ways to utilize multi-query rewrites in both sparse and dense first-pass retrieval. We demonstrate that applying our approach on top of a standard passage retrieval pipeline delivers state-of-the-art performance without sacrificing efficiency.
Information Retrieval
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve retrieval performance by generating multiple query rewrites in conversational passage retrieval. Specifically, the author points out that traditional conversational query rewrite methods usually generate only one de - contextualized query, and this method may not accurately capture the actual needs of users, especially when dealing with long conversations, it is prone to topic drift or missing important information. Therefore, this paper proposes a new multi - query rewrite method (CMQR), aiming to improve retrieval performance by generating multiple query rewrites and using these rewrites to more effectively represent users' information needs in sparse and dense retrieval. ### Main contributions of the paper: 1. **Multi - query rewrite**: Proposed a method of using a fine - tuned generative language model to generate multiple query rewrites in each conversation turn, rather than just generating a single query rewrite. This method takes advantage of the characteristics of the beam search algorithm and can generate multiple rewrites without incurring additional costs. 2. **Application in sparse and dense retrieval**: Explored how to effectively utilize multiple query rewrites in sparse and dense retrieval. In sparse retrieval, multiple queries are expanded and re - estimated through the weighted bag - of - words model; in dense retrieval, the embedding vectors of multiple queries are combined to generate the final query representation. ### Experimental results: - **Sparse retrieval**: The CMQR method improves the MRR metric by 1.06 to 6.31 percentage points compared to the single - query rewrite method. - **Dense retrieval**: The CMQR method improves the MRR metric by 3.52 to 4.45 percentage points compared to the single - query rewrite method. - **Overall performance**: The CMQR method achieves the best or second - best results on a series of evaluation metrics (such as MRR, MAP, R@10) on the QReCC dataset, especially when using the T5QR model, the performance is particularly prominent. ### Conclusion: The method proposed in this paper is not only technically simple and efficient, but also significantly improves the performance of conversational passage retrieval in practical applications, providing a new direction for future research. The author also plans to further explore the application of multi - query rewrite in multi - stage retrieval pipelines in future work and automatically determine the number of rewrites to be considered.