Abstract:Dialogue summarization aims to provide a concise and coherent summary of conversations between multiple speakers. While recent advancements in language models have enhanced this process, summarizing dialogues accurately and faithfully remains challenging due to the need to understand speaker interactions and capture relevant information. Indeed, abstractive models used for dialog summarization may generate summaries that contain inconsistencies. We suggest using the semantic information proposed for performing Spoken Language Understanding (SLU) in human-machine dialogue systems for goal-oriented human-human dialogues to obtain a more semantically faithful summary regarding the task. This study introduces three key contributions: First, we propose an exploration of how incorporating task-related information can enhance the summarization process, leading to more semantically accurate summaries. Then, we introduce a new evaluation criterion based on task semantics. Finally, we propose a new dataset version with increased annotated data standardized for research on task-oriented dialogue summarization. The study evaluates these methods using the DECODA corpus, a collection of French spoken dialogues from a call center. Results show that integrating models with task-related information improves summary accuracy, even with varying word error rates.

What problem does this paper attempt to address?

The paper attempts to address the issue of improving faithfulness in human dialogue summarization, particularly by leveraging semantic information from Spoken Language Understanding (SLU) tasks to achieve this goal. Specifically, the authors point out that current dialogue summarization methods, while improving in terms of fluency and coherence, still face challenges in faithfulness, especially in generating information that is inconsistent with the original dialogue (i.e., "hallucinations"). Therefore, this paper proposes a new approach to improve the faithfulness of dialogue summaries by integrating task-related semantic information (such as call intents and domain-specific named entities). ### Main Contributions: 1. **Exploration of Task-Related Information Integration**: Investigate how to incorporate task-related semantic information (such as call intents and named entities) into the dialogue summarization process to improve the semantic accuracy of the summaries. 2. **Introduction of New Evaluation Metrics**: Propose a new evaluation metric based on task semantics to measure the faithfulness of the summaries. 3. **Proposition of a New Dataset Version**: Create a new dataset with more annotated data specifically for task-oriented dialogue summarization research. ### Experimental Setup: - **Dataset**: Use the DECODEA corpus, a French call center dialogue dataset. - **Model**: Use BARThez (a pre-trained French text generation model) for automatic summarization and CamemBERT-base for call type classification. - **Data Augmentation**: Generate more training data using large language models (such as ChatGPT-3.5) and generate automatic transcriptions using automatic speech recognition systems (such as WhisperX). ### Results: - **Effect of Data Augmentation**: Data augmentation strategies improved the quality of the summaries, especially in terms of ROUGE and BERTScore metrics. - **Impact of Selection Criteria**: Selection criteria based on KL divergence and Named Entity Hallucination Risk (NEHR) can further improve the faithfulness of the summaries, particularly in call type classification accuracy (CT-Acc) and named entity F1 score (NE-F1). ### Conclusion: Integrating task-related semantic information can significantly improve the faithfulness of dialogue summaries. The proposed method not only performs well on automatic evaluation metrics but also has high practical value in real-world applications. Future work can further explore other types of semantic information, such as semantic frames, to further enhance the faithfulness and accuracy of the summaries.

Increasing faithfulness in human-human dialog summarization with Spoken Language Understanding tasks

Topic-Oriented Spoken Dialogue Summarization for Customer Service with Saliency-Aware Topic Modeling

Analyzing and Evaluating Faithfulness in Dialogue Summarization

Human-in-the-loop Abstractive Dialogue Summarization

Enhancing Abstractive Dialogue Summarization with Internal Knowledge

Improving Abstractive Dialogue Summarization with Speaker-Aware Supervised Contrastive Learning.

DialogSum Challenge: Results of the Dialogue Summarization Shared Task

DialSummEval: Revisiting Summarization Evaluation for Dialogues

An Exploratory Study on Long Dialogue Summarization: What Works and What's Next

Hierarchical Summarization for Longform Spoken Dialog

Enhancing Semantic Understanding with Self-supervised Methods for Abstractive Dialogue Summarization

Leverage Unlabeled Data for Abstractive Speech Summarization with Self-Supervised Learning and Back-Summarization

A Survey on Dialogue Summarization: Recent Advances and New Frontiers

DialogSum Challenge: Summarizing Real-Life Scenario Dialogues.

LLM aided semi-supervision for Extractive Dialog Summarization

MSAMSum: Towards Benchmarking Multi-lingual Dialogue Summarization

CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization

Tell me what I need to know: Exploring LLM-based (Personalized) Abstractive Multi-Source Meeting Summarization

Semi-Supervised Dialogue Abstractive Summarization via High-Quality Pseudolabel Selection

Dialog Summarization for Software Collaborative Platform Via Tuning Pre-Trained Models

CONFIT: Toward Faithful Dialogue Summarization with Linguistically-Informed Contrastive Fine-tuning