PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator

Chuyi Kong,Yaxin Fan,Xiang Wan,Feng Jiang,Benyou Wang
2024-05-28
Abstract:The unparalleled performance of closed-sourced ChatGPT has sparked efforts towards its democratization, with notable strides made by leveraging real user and ChatGPT dialogues, as evidenced by Vicuna. However, due to challenges in gathering dialogues involving human participation, current endeavors like Baize and UltraChat rely on ChatGPT conducting roleplay to simulate humans based on instructions, resulting in overdependence on seeds, diminished human-likeness, limited topic diversity, and an absence of genuine multi-round conversational dynamics. To address the above issues, we propose a paradigm to simulate human behavior better and explore the benefits of incorporating more human-like questions in multi-turn conversations. Specifically, we directly target human questions extracted from genuine human-machine conversations as a learning goal and provide a novel user simulator called `Socratic'. The experimental results show our response model, `PlatoLM', achieves SoTA performance among LLaMA-based 7B models in MT-Bench. Our findings further demonstrate that our method introduces highly human-like questioning patterns and rich topic structures, which can teach the response model better than previous works in multi-round conversations.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to more realistically simulate human behavior in multi - turn conversations to improve the performance of large - language models (LLMs) in dialogue tasks. Specifically, the author points out several limitations of current methods: 1. **Dependence on seed conversations**: Many existing methods rely on ChatGPT for static role - playing to simulate human conversations, which results in the conversation content being overly dependent on seed conversations and lacking diversity. 2. **Lack of real multi - turn conversation dynamics**: Since static simulation is difficult to capture real - life human conversation patterns, the generated conversations often lack the natural multi - turn interaction characteristics. 3. **Single - topic structure**: Existing simulation methods are difficult to produce rich topic structures, limiting the diversity and depth of conversations. To solve these problems, the author proposes a new paradigm by training a learnable user simulator (called "Socratic") that directly targets real - human questions for learning. This user simulator can more naturally engage in multi - turn conversations with system agents (such as ChatGPT), thereby generating a conversation dataset (called "SocraticChat") that is closer to real - life scenarios. Finally, a new response model (called "PlatoLM") is trained based on this dataset to improve its performance in multi - turn conversations. ### Main contributions 1. **Proposed an effective human - behavior - simulation paradigm**: By reversing the learning objective, from ChatGPT's answers to real - user questions, the simulator becomes more human - like. 2. **Provided multiple versions of multi - turn conversation datasets**: These datasets expand the scale and diversity of existing conversation datasets. 3. **Trained a new response model, PlatoLM**: With a small number of training samples, PlatoLM performs well in multiple benchmark tests, especially outperforming other models in multi - turn conversation tasks. 4. **Discovered that more human - like questioning patterns are helpful for teaching**: Compared to static role - playing, human - like questioning patterns in dynamic multi - turn conversations can better guide the learning of dialogue models. Through these improvements, the paper demonstrates how to improve the performance of large - language models in multi - turn conversations through more realistic conversation simulation.