EM Pre-training for Multi-party Dialogue Response Generation

Yiyang Li,Hai Zhao
2023-05-21
Abstract:Dialogue response generation requires an agent to generate a response according to the current dialogue history, in terms of which two-party dialogues have been well studied, but leaving a great gap for multi-party dialogues at the same time. Different from two-party dialogues where each response is a direct reply to its previous utterance, the addressee of a response utterance should be specified before it is generated in the multi-party scenario. Thanks to the huge amount of two-party conversational data, various pre-trained language models for two-party dialogue response generation have been proposed. However, due to the lack of annotated addressee labels in multi-party dialogue datasets, it is hard to use them to pre-train a response generation model for multi-party dialogues. To tackle this obstacle, we propose an Expectation-Maximization (EM) approach that iteratively performs the expectation steps to generate addressee labels, and the maximization steps to optimize a response generation model. Theoretical analyses and extensive experiments have justified the feasibility and effectiveness of our proposed method.
Computation and Language
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the issue of response generation in multi-turn, multi-party conversations. Specifically, unlike the extensively studied two-person dialogues, each response in a multi-party conversation needs to be generated based on the current dialogue history and must clearly specify the addressee (i.e., the recipient). However, existing large-scale dialogue datasets mostly contain two-person dialogues and lack annotated addressee labels, making it challenging to directly use these datasets for pre-training multi-party dialogue response generation. To solve this problem, the authors propose an Expectation-Maximization (EM) based method. This method iteratively performs the expectation step to generate addressee labels and optimizes the response generation model in the maximization step. In this way, the method can effectively pre-train on large-scale multi-party dialogue datasets without annotated addressee labels, thereby improving the model's performance in generating high-quality responses. ### Main Contributions 1. **First Study on Pre-training for Multi-party Dialogue Response Generation**: This is the first attempt to pre-train response generation in multi-party dialogues, which are more complex and challenging compared to two-person dialogues. 2. **Proposed EM Method to Alleviate Data Scarcity**: The EM method allows pre-training on large-scale datasets without annotated addressee labels, addressing the issue of data scarcity. 3. **Theoretical Analysis and Experimental Validation**: The paper provides theoretical analysis proving the feasibility of the EM pre-training method, and experimental results on the Ubuntu IRC benchmark dataset show that the pre-trained model achieves state-of-the-art performance on multiple metrics. ### Method Overview 1. **Task Definition**: Given an input sequence containing dialogue history and the responder, along with the addressee of the response, the goal is to train a model to generate the corresponding response. 2. **Addressee Modeling**: By incorporating addressee embeddings into word embeddings and positional encodings, the addressee information is integrated into the response generation process. 3. **Latent Variable Prediction**: In the expectation step, the distribution of unannotated addressees is calculated given the current dialogue context and response. 4. **EM Process**: Alternately perform the expectation step and the maximization step to gradually optimize the model's performance. ### Experimental Results 1. **Automatic Evaluation**: On the Ubuntu IRC benchmark dataset, the proposed model outperforms existing methods on multiple automatic evaluation metrics (such as BLEU, METEOR, ROUGE-L). 2. **Human Evaluation**: Human evaluation further verifies that the responses generated by the model are of high quality in terms of relevance, fluency, and informativeness. ### Conclusion By proposing an EM-based pre-training method, this paper successfully addresses the data scarcity issue in multi-party dialogue response generation and achieves state-of-the-art performance on multiple evaluation metrics. This provides strong support for future applications in multi-party dialogues.