DialoGPS: Dialogue Path Sampling in Continuous Semantic Space for Data Augmentation in Multi-Turn Conversations

Ang Lv,Jinpeng Li,Yuhan Chen,Xing Gao,Ji Zhang,Rui Yan

DOI: https://doi.org/10.48550/arXiv.2306.16770

2023-06-29

Abstract:In open-domain dialogue generation tasks, contexts and responses in most datasets are one-to-one mapped, violating an important many-to-many characteristic: a context leads to various responses, and a response answers multiple contexts. Without such patterns, models poorly generalize and prefer responding safely. Many attempts have been made in either multi-turn settings from a one-to-many perspective or in a many-to-many perspective but limited to single-turn settings. The major challenge to many-to-many augment multi-turn dialogues is that discretely replacing each turn with semantic similarity breaks fragile context coherence. In this paper, we propose DialoGue Path Sampling (DialoGPS) method in continuous semantic space, the first many-to-many augmentation method for multi-turn dialogues. Specifically, we map a dialogue to our extended Brownian Bridge, a special Gaussian process. We sample latent variables to form coherent dialogue paths in the continuous space. A dialogue path corresponds to a new multi-turn dialogue and is used as augmented training data. We show the effect of DialoGPS with both automatic and human evaluation.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

This paper attempts to address the issue in open-domain multi-turn dialogue generation tasks where the context and responses in the dataset are mostly one-to-one mappings, which violates the many-to-many nature of actual conversations. Specifically, one context can lead to multiple different responses, and one response may correspond to multiple different contexts. This one-to-one data pattern results in poor generalization performance of the model and a tendency to generate safe but uninteresting responses. To solve this problem, the paper proposes a new method called DialoGue Path Sampling (DialoGPS), which enhances the multi-turn dialogue dataset by sampling coherent dialogue paths in a continuous semantic space. The specific steps include: 1. Mapping each round of dialogue to an Extended Brownian Bridge. 2. Sampling latent variables on the Brownian Bridge to form coherent dialogue paths. 3. Using these paths to generate new multi-turn dialogues as augmented training data. Through this method, DialoGPS can generate diverse dialogue paths, thereby improving the model's generalization ability and the quality of generated responses. Experimental results show that DialoGPS outperforms existing strong baseline models in both automatic and human evaluations.

DialoGPS: Dialogue Path Sampling in Continuous Semantic Space for Data Augmentation in Multi-Turn Conversations

Read Key Points: Dialogue-Grounded Knowledge Points Generation with Multi-Level Salience-Aware Mixture

Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding.

Contextual Data Augmentation for Task-Oriented Dialog Systems

Data Augmentation of Multi-turn Psychological Dialogue via Knowledge-driven Progressive Thought Prompting

Learning Towards Selective Data Augmentation for Dialogue Generation.

Counterfactual Data Augmentation via Perspective Transition for Open-Domain Dialogues

Data Augmentation for Retrieval- and Generation-Based Dialog Systems

DFlow: Diverse Dialogue Flow Simulation with Large Language Models

Learning Retrieval Augmentation for Personalized Dialogue Generation

Multi-Document Grounded Multi-Turn Synthetic Dialog Generation

Dialogue Generation Model with Hierarchical Encoding and Semantic Segmentation of Dialogue Context

Dialogue Distillation: Open-Domain Dialogue Augmentation Using Unpaired Data

DiffusionDialog: A Diffusion Model for Diverse Dialog Generation with Latent Space

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

Variational Hierarchical Dialog Autoencoder for Dialog State Tracking Data Augmentation

N-Shot Learning for Augmenting Task-Oriented Dialogue State Tracking

Extending the Transformer with Context and Multi-dimensional Mechanism for Dialogue Response Generation.

TA&AT: Enhancing Task-Oriented Dialog with Turn-Level Auxiliary Tasks and Action-Tree Based Scheduled Sampling

Controllable and Diverse Data Augmentation with Large Language Model for Low-Resource Open-Domain Dialogue Generation

Plan, Generate and Complicate: Improving Low-resource Dialogue State Tracking via Easy-to-Difficult Zero-shot Data Augmentation