Think Twice: A Human-like Two-stage Conversational Agent for Emotional Response Generation

Yushan Qian,Bo Wang,Shangzhao Ma,Wu Bin,Shuo Zhang,Dongming Zhao,Kun Huang,Yuexian Hou
2024-10-01
Abstract:Towards human-like dialogue systems, current emotional dialogue approaches jointly model emotion and semantics with a unified neural network. This strategy tends to generate safe responses due to the mutual restriction between emotion and semantics, and requires rare emotion-annotated large-scale dialogue corpus. Inspired by the "think twice" behavior in human dialogue, we propose a two-stage conversational agent for the generation of emotional dialogue. Firstly, a dialogue model trained without the emotion-annotated dialogue corpus generates a prototype response that meets the contextual semantics. Secondly, the first-stage prototype is modified by a controllable emotion refiner with the empathy hypothesis. Experimental results on the DailyDialog and EmpatheticDialogues datasets demonstrate that the proposed conversational outperforms the comparison models in emotion generation and maintains the semantic performance in automatic and human evaluations.
Computation and Language,Human-Computer Interaction
What problem does this paper attempt to address?
The paper attempts to address the problem of how to generate high-quality dialogue responses that are both contextually appropriate and express suitable emotions in open-domain dialogue generation tasks. Current emotional dialogue systems typically use a unified neural network model to jointly model emotions and semantics. While this approach can generate relatively safe responses, it faces the following challenges: 1. **Emotional signals are weakened in the complex learning process**: In deep neural networks, the input emotional signals often gradually weaken during the complex training process. 2. **Emotion-enhancing design limits semantic performance**: In joint generation models, designs aimed at enhancing emotions often limit the semantic performance of the generated responses, resulting in overly conservative responses. 3. **Large-scale emotionally annotated dialogue datasets are scarce**: Large-scale emotionally annotated dialogue datasets for joint training of semantics and emotions are very scarce. To address the above challenges, the paper proposes a two-stage conversational agent model, aiming to generate emotional dialogues by imitating the "rethinking" strategy in human intelligent dialogue behavior. Specifically, the model is divided into two stages: 1. **Prototype response generation**: First, a pre-trained dialogue model is used to generate a prototype response that conforms to the contextual semantics. This stage does not require emotionally annotated data. 2. **Controllable emotional refinement**: Then, a controllable emotional refinement module is used to modify the prototype response to ensure that the final response is both contextually appropriate and expresses the suitable emotion. This stage includes two sub-modules: - **Rewrite module**: Adjusts the emotional attributes of the response by replacing original emotional words or phrases. - **Add module**: Adjusts the emotional type of the response by adding additional sentences. Experimental results show that the two-stage conversational agent model outperforms existing comparison models in both emotional generation and semantic performance on the DailyDialog and EmpatheticDialogues datasets.