Abstract:Dialogue State Tracking (DST) is designed to monitor the evolving dialogue state in the conversations and plays a pivotal role in developing task-oriented dialogue systems. However, obtaining the annotated data for the DST task is usually a costly endeavor. In this paper, we focus on employing LLMs to generate dialogue data to reduce dialogue collection and annotation costs. Specifically, GPT-4 is used to simulate the user and agent interaction, generating thousands of dialogues annotated with DST labels. Then a two-stage fine-tuning on LLaMA 2 is performed on the generated data and the real data for the DST prediction. Experimental results on two public DST benchmarks show that with the generated dialogue data, our model performs better than the baseline trained solely on real data. In addition, our approach is also capable of adapting to the dynamic demands in real-world scenarios, generating dialogues in new domains swiftly. After replacing dialogue segments in any domain with the corresponding generated ones, the model achieves comparable performance to the model trained on real data.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the high cost of data annotation in the task of dialogue state tracking (DST). Specifically, the authors propose a method to generate dialogue data using large language models (LLM) to reduce the cost of collecting and annotating dialogue data. By using GPT-4 to simulate interactions between users and agents, thousands of dialogues with DST labels are generated, and LLaMA 2 is fine-tuned in two stages on these generated and real data to improve DST prediction performance. ### Main Contributions 1. **Proposed New Framework**: Utilized GPT-4 to generate new labeled dialogue data, effectively reducing the cost of collecting and annotating dialogue data. 2. **Experimental Results**: Experimental results on two public DST benchmark datasets show that the generated data significantly improved model performance. 3. **Adaptation to New Domains**: The method can quickly generate dialogue data for new domains while maintaining good performance. 4. **Scalability**: The authors believe that this method has the potential to be extended to other dialogue-related tasks. ### Method Overview 1. **Problem Definition**: Defined the multi-turn dialogue context and goal in task-oriented dialogues, which is to predict the dialogue state, consisting of a series of (slot, value) pairs. 2. **Using LLaMA 2 to Predict Dialogue State**: Fine-tuned LLaMA 2 with full parameters and used pre-designed prompts to guide the model to generate JSON-formatted prediction results. 3. **GPT-4 Based User-Agent Dialogue Simulation**: GPT-4 was used to generate dialogues between users and agents, including user needs, search result reports, recommendations, attribute queries, and action requests. 4. **Two-Stage Fine-Tuning Strategy**: First fine-tuned LLaMA 2 with the generated data, then continued fine-tuning with real data to ensure the model can effectively bridge the gap between generated and real data. ### Experimental Results 1. **Performance Improvement**: On the MultiWOZ 2.2 and 2.4 datasets, LLaMA 2 fine-tuned only with real data (LUAS R) already surpassed previous DST baseline models. After adding the generated data (LUAS R+G), performance further improved by 0.83% and 1%, respectively. 2. **Data Replacement Experiment**: On the MultiWOZ 2.2 dataset, replacing specific domain dialogue data with generated data resulted in an average decrease of 0.75% in joint goal accuracy (JGA) on the test set, but overall performance remained good, indicating that the generated data can effectively adapt to new domains. ### Practical Application This method provides a quick way to automate dialogue generation, allowing for the rapid development of dialogue systems in new domains, saving a significant amount of time and cost.

Enhancing Dialogue State Tracking Models through LLM-backed User-Agents Simulation

GLMDST: Chinese Dialogue State Tracking Framework Driven by LLM

Exploiting domain-slot related keywords description for Few-Shot Cross-Domain Dialogue State Tracking

Dual Learning for Dialogue State Tracking

Large Language Models as Zero-shot Dialogue State Tracker through Function Calling

Cost-Sensitive Active Learning for Dialogue State Tracking.

STN4DST: A Scalable Dialogue State Tracking based on Slot Tagging Navigation

Label-Aware Auxiliary Learning for Dialogue State Tracking.

Non-Autoregressive Dialog State Tracking

Dialogue State Distillation Network with Inter-Slot Contrastive Learning for Dialogue State Tracking.

Stabilized In-Context Learning with Pre-trained Language Models for Few Shot Dialogue State Tracking

Enhanced Multi-Domain Dialogue State Tracker with Second-Order Slot Interactions

Amendable Generation for Dialogue State Tracking

Turn-Level Active Learning for Dialogue State Tracking

Domain Adaptive Meta-Learning for Dialogue State Tracking

SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State Tracking

Progressive Dialogue State Tracking for Multi-Domain Dialogue Systems

MetaASSIST: Robust Dialogue State Tracking with Meta Learning

Hybrid Dialogue State Tracking for Real World Human-to-Human Dialogues

S3-dst: Structured open-domain dialogue segmentation and state tracking in the era of llms

Diverse and Effective Synthetic Data Generation for Adaptable Zero-Shot Dialogue State Tracking