Data Augmentation for Retrieval- and Generation-Based Dialog Systems

Qingqing Zhu,Xiwei Wang,Chen,Junfei Liu
DOI: https://doi.org/10.1109/iccc51575.2020.9344922
2020-01-01
Abstract:In this study we improve the model performance by data augmentation for dialog system. Contrary to previous work which proposes novel models, we enlarge the training data to achieve the same effect. We assume that there exists a many-to-many relationship between queries and replies in dialog system and we try to find such query-reply pairs. Specifically, we use existing conversational corpus (which consists of one to one query-reply pairs) to build a state-of-the-practice retrieval system, then the agent is asked to select the k most likely replies for queries inside the corpus, adding them to original conversation dataset. By doing this, we acquire many-to-many dialogue corpus called augmented corpus. Finally, we use the original corpus and the augmented corpus to conduct experiments on both generative and neural retrieval dialogue system by utilizing dual retrieval model and sequence to sequence generative model, obtaining strong performance improvement only by replacing the training corpus.
What problem does this paper attempt to address?