A Static and Dynamic Attention Framework for Multi Turn Dialogue Generation

Wei-Nan Zhang,Yiming Cui,Kaiyan Zhang,Yifa Wang,Qingfu Zhu,Lingzhi Li,Ting Liu
DOI: https://doi.org/10.1145/3522763
2024-10-28
Abstract:Recently, research on open domain dialogue systems have attracted extensive interests of academic and industrial researchers. The goal of an open domain dialogue system is to imitate humans in conversations. Previous works on single turn conversation generation have greatly promoted the research of open domain dialogue systems. However, understanding multiple single turn conversations is not equal to the understanding of multi turn dialogue due to the coherent and context dependent properties of human dialogue. Therefore, in open domain multi turn dialogue generation, it is essential to modeling the contextual semantics of the dialogue history, rather than only according to the last utterance. Previous research had verified the effectiveness of the hierarchical recurrent encoder-decoder framework on open domain multi turn dialogue generation. However, using RNN-based model to hierarchically encoding the utterances to obtain the representation of dialogue history still face the problem of a vanishing gradient. To address this issue, in this paper, we proposed a static and dynamic attention-based approach to model the dialogue history and then generate open domain multi turn dialogue responses. Experimental results on Ubuntu and Opensubtitles datasets verify the effectiveness of the proposed static and dynamic attention-based approach on automatic and human evaluation metrics in various experimental settings. Meanwhile, we also empirically verify the performance of combining the static and dynamic attentions on open domain multi turn dialogue generation.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the problem of how to effectively model the contextual semantics of dialogue history in open-domain multi-turn dialogue generation to produce coherent and diverse dialogue responses. Specifically, the paper points out that although significant progress has been made in previous single-turn dialogue generation research, understanding multiple single-turn dialogues is not equivalent to understanding multi-turn dialogues, as human conversations exhibit coherence and contextual dependency. Therefore, in multi-turn dialogue generation, relying solely on the information from the last turn of dialogue is insufficient, and it is necessary to model the contextual information of the entire dialogue history. To solve this problem, the paper proposes a framework that combines static and dynamic attention mechanisms to model dialogue history and generate multi-turn dialogue responses. This framework improves existing methods in the following ways: 1. **Static Attention**: Calculates the importance weights of each dialogue turn and keeps these weights unchanged during the decoding process. 2. **Dynamic Attention**: Dynamically updates the weights of each dialogue turn during the decoding process to adapt to different dialogue contexts. 3. **Hybrid Attention**: Combines static and dynamic attention by integrating them in various ways (such as concatenation, summation, linear interpolation, and element-wise pooling) to enhance the model's performance. Experimental results show that this framework performs excellently on both automatic and human evaluation metrics on the Ubuntu and Opensubtitles datasets, validating its effectiveness and superiority.