Abstract:Dialogue State Tracking (DST) is designed to monitor the evolving dialogue state in the conversations and plays a pivotal role in developing task-oriented dialogue systems. However, obtaining the annotated data for the DST task is usually a costly endeavor. In this paper, we focus on employing LLMs to generate dialogue data to reduce dialogue collection and annotation costs. Specifically, GPT-4 is used to simulate the user and agent interaction, generating thousands of dialogues annotated with DST labels. Then a two-stage fine-tuning on LLaMA 2 is performed on the generated data and the real data for the DST prediction. Experimental results on two public DST benchmarks show that with the generated dialogue data, our model performs better than the baseline trained solely on real data. In addition, our approach is also capable of adapting to the dynamic demands in real-world scenarios, generating dialogues in new domains swiftly. After replacing dialogue segments in any domain with the corresponding generated ones, the model achieves comparable performance to the model trained on real data.
What problem does this paper attempt to address?
### Problems Addressed by the Paper
This paper aims to address the high cost of data annotation in the task of dialogue state tracking (DST). Specifically, the authors propose a method to generate dialogue data using large language models (LLM) to reduce the cost of collecting and annotating dialogue data. By using GPT-4 to simulate interactions between users and agents, thousands of dialogues with DST labels are generated, and LLaMA 2 is fine-tuned in two stages on these generated and real data to improve DST prediction performance.
### Main Contributions
1. **Proposed New Framework**: Utilized GPT-4 to generate new labeled dialogue data, effectively reducing the cost of collecting and annotating dialogue data.
2. **Experimental Results**: Experimental results on two public DST benchmark datasets show that the generated data significantly improved model performance.
3. **Adaptation to New Domains**: The method can quickly generate dialogue data for new domains while maintaining good performance.
4. **Scalability**: The authors believe that this method has the potential to be extended to other dialogue-related tasks.
### Method Overview
1. **Problem Definition**: Defined the multi-turn dialogue context and goal in task-oriented dialogues, which is to predict the dialogue state, consisting of a series of (slot, value) pairs.
2. **Using LLaMA 2 to Predict Dialogue State**: Fine-tuned LLaMA 2 with full parameters and used pre-designed prompts to guide the model to generate JSON-formatted prediction results.
3. **GPT-4 Based User-Agent Dialogue Simulation**: GPT-4 was used to generate dialogues between users and agents, including user needs, search result reports, recommendations, attribute queries, and action requests.
4. **Two-Stage Fine-Tuning Strategy**: First fine-tuned LLaMA 2 with the generated data, then continued fine-tuning with real data to ensure the model can effectively bridge the gap between generated and real data.
### Experimental Results
1. **Performance Improvement**: On the MultiWOZ 2.2 and 2.4 datasets, LLaMA 2 fine-tuned only with real data (LUAS R) already surpassed previous DST baseline models. After adding the generated data (LUAS R+G), performance further improved by 0.83% and 1%, respectively.
2. **Data Replacement Experiment**: On the MultiWOZ 2.2 dataset, replacing specific domain dialogue data with generated data resulted in an average decrease of 0.75% in joint goal accuracy (JGA) on the test set, but overall performance remained good, indicating that the generated data can effectively adapt to new domains.
### Practical Application
This method provides a quick way to automate dialogue generation, allowing for the rapid development of dialogue systems in new domains, saving a significant amount of time and cost.