Abstract:Dialogue response generation requires an agent to generate a response according to the current dialogue history, in terms of which two-party dialogues have been well studied, but leaving a great gap for multi-party dialogues at the same time. Different from two-party dialogues where each response is a direct reply to its previous utterance, the addressee of a response utterance should be specified before it is generated in the multi-party scenario. Thanks to the huge amount of two-party conversational data, various pre-trained language models for two-party dialogue response generation have been proposed. However, due to the lack of annotated addressee labels in multi-party dialogue datasets, it is hard to use them to pre-train a response generation model for multi-party dialogues. To tackle this obstacle, we propose an Expectation-Maximization (EM) approach that iteratively performs the expectation steps to generate addressee labels, and the maximization steps to optimize a response generation model. Theoretical analyses and extensive experiments have justified the feasibility and effectiveness of our proposed method.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the issue of response generation in multi-turn, multi-party conversations. Specifically, unlike the extensively studied two-person dialogues, each response in a multi-party conversation needs to be generated based on the current dialogue history and must clearly specify the addressee (i.e., the recipient). However, existing large-scale dialogue datasets mostly contain two-person dialogues and lack annotated addressee labels, making it challenging to directly use these datasets for pre-training multi-party dialogue response generation. To solve this problem, the authors propose an Expectation-Maximization (EM) based method. This method iteratively performs the expectation step to generate addressee labels and optimizes the response generation model in the maximization step. In this way, the method can effectively pre-train on large-scale multi-party dialogue datasets without annotated addressee labels, thereby improving the model's performance in generating high-quality responses. ### Main Contributions 1. **First Study on Pre-training for Multi-party Dialogue Response Generation**: This is the first attempt to pre-train response generation in multi-party dialogues, which are more complex and challenging compared to two-person dialogues. 2. **Proposed EM Method to Alleviate Data Scarcity**: The EM method allows pre-training on large-scale datasets without annotated addressee labels, addressing the issue of data scarcity. 3. **Theoretical Analysis and Experimental Validation**: The paper provides theoretical analysis proving the feasibility of the EM pre-training method, and experimental results on the Ubuntu IRC benchmark dataset show that the pre-trained model achieves state-of-the-art performance on multiple metrics. ### Method Overview 1. **Task Definition**: Given an input sequence containing dialogue history and the responder, along with the addressee of the response, the goal is to train a model to generate the corresponding response. 2. **Addressee Modeling**: By incorporating addressee embeddings into word embeddings and positional encodings, the addressee information is integrated into the response generation process. 3. **Latent Variable Prediction**: In the expectation step, the distribution of unannotated addressees is calculated given the current dialogue context and response. 4. **EM Process**: Alternately perform the expectation step and the maximization step to gradually optimize the model's performance. ### Experimental Results 1. **Automatic Evaluation**: On the Ubuntu IRC benchmark dataset, the proposed model outperforms existing methods on multiple automatic evaluation metrics (such as BLEU, METEOR, ROUGE-L). 2. **Human Evaluation**: Human evaluation further verifies that the responses generated by the model are of high quality in terms of relevance, fluency, and informativeness. ### Conclusion By proposing an EM-based pre-training method, this paper successfully addresses the data scarcity issue in multi-party dialogue response generation and achieves state-of-the-art performance on multiple evaluation metrics. This provides strong support for future applications in multi-party dialogues.

EM Pre-training for Multi-party Dialogue Response Generation

Deep Reinforcement Learning for Dialogue Generation

PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable

Towards Robust Online Dialogue Response Generation

Deep context modeling for multi-turn response selection in dialogue systems

DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for Dialog Response Generation

Constructing Emotion Consensus and Utilizing Unpaired Data for Empathetic Dialogue Generation

A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data

Joint Learning for Addressee Selection and Response Generation in Multi-Party Conversation

Pretrained Language Models for Dialogue Generation with Multiple Input Sources.

EmPO: Emotion Grounding for Empathetic Response Generation through Preference Optimization

Human–Machine Multi-Turn Language Dialogue Interaction Based on Deep Learning

Generating Personalized Dialogue via Multi-Task Meta-Learning

Chinese Emotional Dialogue Response Generation via Reinforcement Learning

Get The Point of My Utterance! Learning Towards Effective Responses with Multi-Head Attention Mechanism

Towards Efficient Dialogue Pre-training with Transferable and Interpretable Latent Structure

Empathetic Response Generation with State Management

Personalized Dialogue Response Generation Learned from Monologues

Knowledge-Grounded Dialogue Generation with Pre-trained Language Models

Harnessing the Power of Large Language Models for Empathetic Response Generation: Empirical Investigations and Improvements

Filling the Gap of Utterance-aware and Speaker-aware Representation for Multi-turn Dialogue