Dialog State Tracking with Reinforced Data Augmentation

Yichun Yin,Lifeng Shang,Xin Jiang,Xiao Chen,Qun Liu
DOI: https://doi.org/10.48550/arXiv.1908.07795
2019-11-18
Abstract:Neural dialog state trackers are generally limited due to the lack of quantity and diversity of annotated training data. In this paper, we address this difficulty by proposing a reinforcement learning (RL) based framework for data augmentation that can generate high-quality data to improve the neural state tracker. Specifically, we introduce a novel contextual bandit generator to learn fine-grained augmentation policies that can generate new effective instances by choosing suitable replacements for the specific context. Moreover, by alternately learning between the generator and the state tracker, we can keep refining the generative policies to generate more high-quality training data for neural state tracker. Experimental results on the WoZ and MultiWoZ (restaurant) datasets demonstrate that the proposed framework significantly improves the performance over the state-of-the-art models, especially with limited training data.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the performance limitation in dialog state tracking (DST) due to the insufficient quantity and diversity of annotated training data. Specifically, neural dialog state trackers are usually limited by the lack of a sufficient quantity and diversity of annotated training data. To solve this difficult problem, the author proposes a data - enhancement framework based on reinforcement learning (RL), which can generate high - quality data to improve neural state trackers. By introducing a novel contextual bandit generator, this framework can learn fine - grained data - enhancement strategies, which can generate new valid instances by selecting appropriate replacements in specific contexts. In addition, by alternating learning between the generator and the state tracker, the generation strategy can be continuously optimized, thereby generating more high - quality training data for the neural state tracker. Experimental results show that this framework significantly improves performance on the WoZ and MultiWoZ (restaurant) datasets, especially in cases where the training data is limited.