Abstract:With the increasing popularity of conversational search, how to evaluate the performance of conversational search systems has become an important question in the IR community. Existing works on conversational search evaluation can mainly be categorized into two streams: (1) constructing metrics based on semantic similarity (e.g. BLUE, METEOR and BERTScore), or (2) directly evaluating the response ranking performance of the system using traditional search methods (e.g. nDCG, RBP and nERR). However, these methods either ignore the information need of the user or ignore the mixed-initiative property of conversational search. This raises the question of how to accurately model user satisfaction in conversational search scenarios. Since explicitly asking users to provide satisfaction feedback is difficult, traditional IR studies often rely on the Cranfield paradigm (i.e., third-party annotation) and user behavior modeling to estimate user satisfaction in search. However, the feasibility and effectiveness of these two approaches have not been fully explored in conversational search. In this paper, we dive into the evaluation of conversational search from the perspective of user satisfaction. We build a novel conversational search experimental platform and construct a Chinese open-domain conversational search behavior dataset containing rich annotations and search behavior data. We also collect third-party satisfaction annotation at the session-level and turn-level, to investigate the feasibility of the Cranfield paradigm in the conversational search scenario. Experimental results show both some consistency and considerable differences between the user satisfaction annotations and third-party annotations. We also propose dialog continuation or ending behavior models (DCEBM) to capture session-level user satisfaction based on turn-level information.

An In-depth Investigation of User Response Simulation for Conversational Search.

Uman-in-thel oop

Exploiting Simulated User Feedback for Conversational Search: Ranking, Rewriting, and Beyond

Analysing Utterances in LLM-based User Simulation for Conversational Search

Towards Better Understanding of User Satisfaction in Open-Domain Conversational Search

Towards a Formal Characterization of User Simulation Objectives in Conversational Information Access

SimUser: Generating Usability Feedback by Simulating Various Users Interacting with Mobile Applications

Simulating and Modeling the Risk of Conversational Search

State of the Art of User Simulation approaches for conversational information retrieval

A Survey of Conversational Search

Leveraging User Simulation to Develop and Evaluate Conversational Information Access Agents

USimAgent: Large Language Models for Simulating Search Users

Towards Conversational Search and Recommendation: System Ask, User Respond.

How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation

Response Enhanced Semi-supervised Dialogue Query Generation

Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation

Structured and Natural Responses Co-generation for Conversational Search

A Survey on Response Selection for Retrieval-based Dialogues.

PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator

Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems

User Simulation for Evaluating Information Access Systems