Abstract:In conversational search, the user's real search intent for the current turn is dependent on the previous conversation history. It is challenging to determine a good search query from the whole conversation context. To avoid the expensive re-training of the query encoder, most existing methods try to learn a rewriting model to de-contextualize the current query by mimicking the manual query rewriting. However, manually rewritten queries are not always the best search queries. Training a rewriting model on them would limit the model's ability to produce good search queries. Another useful hint is the potential answer to the question. In this paper, we propose ConvGQR, a new framework to reformulate conversational queries based on generative pre-trained language models (PLMs), one for query rewriting and another for generating potential answers. By combining both, ConvGQR can produce better search queries. In addition, to relate query reformulation to retrieval performance, we propose a knowledge infusion mechanism to optimize both query reformulation and retrieval. Extensive experiments on four conversational search datasets demonstrate the effectiveness of ConvGQR.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in conversational search, how to accurately determine the user's actual search intention based on the entire conversation history and generate more effective search queries. Specifically, the paper proposes improvements in view of the shortcomings of existing methods: 1. **Limitations of existing methods**: - **Limitations of manually rewriting queries**: Existing conversational query - rewriting models are usually trained based on imitating manually rewritten queries, but these manually rewritten queries are not always the best search queries. - **Separation of query rewriting and query expansion**: Query rewriting and query expansion are usually studied separately, and they each have different effects. Query rewriting tends to deal with ambiguous queries and add missing information, while query expansion aims to add supplementary information to the query. 2. **Proposed new framework**: - **ConvGQR (Generative Query Reformulation for Conversational Search)**: This framework combines generative pre - trained language models (PLMs) to rewrite queries and generate potential answers in order to enhance query expansion. In this way, ConvGQR can generate better search queries. - **Knowledge injection mechanism**: In order to associate query rewriting with the retrieval task, a knowledge injection mechanism is proposed to optimize query - rewriting and retrieval performance. 3. **Objectives**: - Improve the quality of the queries generated in conversational search so that they can better match relevant documents, thereby enhancing the retrieval effect. - Verify the effect of combining query rewriting and query expansion, and verify its superiority through experiments. ### Formula representation The formulas involved in the paper are as follows: - The objective function of the query - rewriting model is: \[ \theta_M=\arg\max_{\theta_M}\prod_{k = 1}^{i - 1}\Pr(q^*|M\{H_k,q_i\},\theta_M) \] where \(q^*\) is the supervision signal (i.e., the manually rewritten query in the training data), \(H_k\) is the conversation history, and \(q_i\) is the current query. - The generation loss function is: \[ L_{\text{gen}}=-\sum_{t = 1}^T\log\left(\Pr(w_t|w_{1:t - 1},H_k,q_i)\right) \] - The mean - squared - error (MSE) loss function in the knowledge injection mechanism is: \[ L_{\text{ret}}=\text{MSE}(h_S,h_p^+) \] where \(h_S\) is the session query representation and \(h_p^+\) is the relevant document representation. - The overall training objective is: \[ L_{\text{ConvGQR}}=L_{\text{gen}}+\alpha\cdot L_{\text{ret}} \] where \(\alpha\) is a weighting factor used to balance the influence of generation and retrieval. Through these improvements, the ConvGQR framework can generate more effective queries in conversational search, thereby improving retrieval performance.

ConvGQR: Generative Query Reformulation for Conversational Search

Generative Query Reformulation for Effective Adhoc Search

IterCQR: Iterative Conversational Query Reformulation with Retrieval Guidance

Conversational Query Reformulation with the Guidance of Retrieved Documents

Aligning Query Representation with Rewritten Query and Relevance Judgments in Conversational Search

AdaCQR: Enhancing Query Reformulation for Conversational Search via Sparse and Dense Retrieval Alignment

Generative Query Reformulation Using Ensemble Prompting, Document Fusion, and Relevance Feedback

ZeQR: Zero-shot Query Reformulation for Conversational Search

Few-Shot Generative Conversational Query Rewriting

Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers

CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search

Generating Relevant and Informative Questions for Open-domain Conversations

GenQREnsemble: Zero-Shot LLM Ensemble Prompting for Generative Query Reformulation

GenCRF: Generative Clustering and Reformulation Framework for Enhanced Intent-Driven Information Retrieval

Enhancing Conversational Search: Large Language Model-Aided Informative Query Rewriting

Pre-Training for Query Rewriting in A Spoken Language Understanding System

RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation

Mixed-initiative Query Rewriting in Conversational Passage Retrieval

Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting

Translating Embeddings For Modeling Query Reformulation

CO3: Low-resource Contrastive Co-training for Generative Conversational Query Rewrite