Abstract:Conversational Recommender System (CRS) interacts with users through natural language to understand their preferences and provide personalized recommendations in real-time. CRS has demonstrated significant potential, prompting researchers to address the development of more realistic and reliable user simulators as a key focus. Recently, the capabilities of Large Language Models (LLMs) have attracted a lot of attention in various fields. Simultaneously, efforts are underway to construct user simulators based on LLMs. While these works showcase innovation, they also come with certain limitations that require attention. In this work, we aim to analyze the limitations of using LLMs in constructing user simulators for CRS, to guide future research. To achieve this goal, we conduct analytical validation on the notable work, iEvaLM. Through multiple experiments on two widely-used datasets in the field of conversational recommendation, we highlight several issues with the current evaluation methods for user simulators based on LLMs: (1) Data leakage, which occurs in conversational history and the user simulator's replies, results in inflated evaluation results. (2) The success of CRS recommendations depends more on the availability and quality of conversational history than on the responses from user simulators. (3) Controlling the output of the user simulator through a single prompt template proves challenging. To overcome these limitations, we propose SimpleUserSim, employing a straightforward strategy to guide the topic toward the target items. Our study validates the ability of CRS models to utilize the interaction information, significantly improving the recommendation results.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to analyze the limitations of current user simulators built on large - language models (LLM) in the conversational recommendation system (CRS) and propose solutions. Specifically, the paper focuses on the following issues: 1. **Data leakage**: There is a data leakage phenomenon in the conversation history and the responses of the user simulator, resulting in an overestimation of the evaluation results. 2. **Recommendation success depends on conversation history rather than user simulator**: The successful recommendation of CRS depends more on the quality of the conversation history rather than the responses of the user simulator. 3. **Difficulty in controlling user simulator output with a single - prompt template**: It is challenging to control the output of the user simulator through a single - prompt template, and it is difficult to finely control its responses in different scenarios. ### Main findings of the paper 1. **Data leakage**: - Data leakage occurs in the conversation history and the responses of the user simulator, resulting in an overestimation of the evaluation results. - When ignoring these recommended conversations that are successful due to data leakage, the performance of all baseline models drops significantly, indicating that data leakage has a great impact on the evaluation results. 2. **Dependence of recommendation success**: - The successful recommendation of CRS depends more on the quality of the conversation history rather than the responses of the user simulator. - If CRS makes a successful recommendation in the first round of interaction, it means that it can make a successful recommendation only based on the conversation history. - In subsequent rounds, CRS has a poor effect on using the interaction information provided by the user simulator. 3. **Output control of user simulator**: - It is challenging to control the output of the user simulator through a single - prompt template, especially in complex conversation scenarios. - Current user simulators perform poorly in generating expected responses, especially in small - talk scenarios. ### Solutions To alleviate the above problems, the paper proposes a simple user simulator - SimpleUserSim. The main improvements of SimpleUserSim include: 1. **Ensure that the user simulator only knows the attribute information of the target item**: Until a successful recommendation is made, the user simulator does not know the title of the target item. 2. **Take different actions according to the intention of CRS**: - **Small - talk**: Generate a conversation flow based on the current topic and preferences. - **Inquiry**: Respond to CRS questions according to real - time preferences. - **Recommendation**: Check whether the recommended item is consistent with the target item and provide positive or negative feedback. ### Experimental results The experimental results show that SimpleUserSim outperforms existing user simulators in several aspects: - **Significantly reduces the data leakage problem caused by the user simulator**. - **Shows better performance in multi - round interactions**, especially in the second to fifth rounds of interaction. - **Can better express preferences in small - talk scenarios**, enabling CRS to more effectively use the responses of the user simulator for recommendation. ### Conclusion Through the analysis of existing user simulators based on LLM, the paper reveals their limitations in the conversational recommendation system and proposes a simple and effective solution - SimpleUserSim. This research provides a valuable reference for future research in the field of conversational recommendation systems.

How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation

A LLM-based Controllable, Scalable, Human-Involved User Simulator Framework for Conversational Recommender Systems

Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models

Leveraging Large Language Models in Conversational Recommender Systems

What Else Would I Like? A User Simulator using Alternatives for Improved Evaluation of Fashion Conversational Recommendation Systems

SimUser: Generating Usability Feedback by Simulating Various Users Interacting with Mobile Applications

A Large Language Model Enhanced Conversational Recommender System

EventChat: Implementation and user-centric evaluation of a large language model-driven conversational recommender system for exploring leisure events in an SME context

Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation

A Multi-Agent Conversational Recommender System

Stop Playing the Guessing Game! Target-free User Simulation for Evaluating Conversational Recommender Systems

LLM-ESR: Large Language Models Enhancement for Long-tailed Sequential Recommendation

Sim2Rec: A Simulator-based Decision-making Approach to Optimize Real-World Long-term User Engagement in Sequential Recommender Systems

Improving Conversational Recommendation Systems via Counterfactual Data Simulation

KuaiSim: A Comprehensive Simulator for Recommender Systems

Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs

Behavior Alignment: A New Perspective of Evaluating LLM-based Conversational Recommender Systems

An In-depth Investigation of User Response Simulation for Conversational Search.

Advances and Challenges in Conversational Recommender Systems: A Survey

UserSimCRS: A User Simulation Toolkit for Evaluating Conversational Recommender Systems

Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems