Increasing faithfulness in human-human dialog summarization with Spoken Language Understanding tasks

Eunice Akani,Benoit Favre,Frederic Bechet,Romain Gemignani
2024-09-16
Abstract:Dialogue summarization aims to provide a concise and coherent summary of conversations between multiple speakers. While recent advancements in language models have enhanced this process, summarizing dialogues accurately and faithfully remains challenging due to the need to understand speaker interactions and capture relevant information. Indeed, abstractive models used for dialog summarization may generate summaries that contain inconsistencies. We suggest using the semantic information proposed for performing Spoken Language Understanding (SLU) in human-machine dialogue systems for goal-oriented human-human dialogues to obtain a more semantically faithful summary regarding the task. This study introduces three key contributions: First, we propose an exploration of how incorporating task-related information can enhance the summarization process, leading to more semantically accurate summaries. Then, we introduce a new evaluation criterion based on task semantics. Finally, we propose a new dataset version with increased annotated data standardized for research on task-oriented dialogue summarization. The study evaluates these methods using the DECODA corpus, a collection of French spoken dialogues from a call center. Results show that integrating models with task-related information improves summary accuracy, even with varying word error rates.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the issue of improving faithfulness in human dialogue summarization, particularly by leveraging semantic information from Spoken Language Understanding (SLU) tasks to achieve this goal. Specifically, the authors point out that current dialogue summarization methods, while improving in terms of fluency and coherence, still face challenges in faithfulness, especially in generating information that is inconsistent with the original dialogue (i.e., "hallucinations"). Therefore, this paper proposes a new approach to improve the faithfulness of dialogue summaries by integrating task-related semantic information (such as call intents and domain-specific named entities). ### Main Contributions: 1. **Exploration of Task-Related Information Integration**: Investigate how to incorporate task-related semantic information (such as call intents and named entities) into the dialogue summarization process to improve the semantic accuracy of the summaries. 2. **Introduction of New Evaluation Metrics**: Propose a new evaluation metric based on task semantics to measure the faithfulness of the summaries. 3. **Proposition of a New Dataset Version**: Create a new dataset with more annotated data specifically for task-oriented dialogue summarization research. ### Experimental Setup: - **Dataset**: Use the DECODEA corpus, a French call center dialogue dataset. - **Model**: Use BARThez (a pre-trained French text generation model) for automatic summarization and CamemBERT-base for call type classification. - **Data Augmentation**: Generate more training data using large language models (such as ChatGPT-3.5) and generate automatic transcriptions using automatic speech recognition systems (such as WhisperX). ### Results: - **Effect of Data Augmentation**: Data augmentation strategies improved the quality of the summaries, especially in terms of ROUGE and BERTScore metrics. - **Impact of Selection Criteria**: Selection criteria based on KL divergence and Named Entity Hallucination Risk (NEHR) can further improve the faithfulness of the summaries, particularly in call type classification accuracy (CT-Acc) and named entity F1 score (NE-F1). ### Conclusion: Integrating task-related semantic information can significantly improve the faithfulness of dialogue summaries. The proposed method not only performs well on automatic evaluation metrics but also has high practical value in real-world applications. Future work can further explore other types of semantic information, such as semantic frames, to further enhance the faithfulness and accuracy of the summaries.