Abstract:Abstractive dialogue summarization is the task of distilling conversations into informative and concise summaries. Although reviews have been conducted on this topic, there is a lack of comprehensive work detailing the challenges of dialogue summarization, unifying the differing understanding of the task, and aligning proposed techniques, datasets, and evaluation metrics with the challenges. This article summarizes the research on Transformer-based abstractive summarization for English dialogues by systematically reviewing 1262 unique research papers published between 2019 and 2024, relying on the Semantic Scholar and DBLP databases. We cover the main challenges present in dialog summarization (i.e., language, structure, comprehension, speaker, salience, and factuality) and link them to corresponding techniques such as graph-based approaches, additional training tasks, and planning strategies, which typically overly rely on BART-based encoder-decoder models. We find that while some challenges, like language, have seen considerable progress, mainly due to training methods, others, such as comprehension, factuality, and salience, remain difficult and hold significant research opportunities. We investigate how these approaches are typically assessed, covering the datasets for the subdomains of dialogue (e.g., meeting, medical), the established automatic metrics and human evaluation approaches for assessing scores and annotator agreement. We observe that only a few datasets span across all subdomains. The ROUGE metric is the most used, while human evaluation is frequently reported without sufficient detail on inner-annotator agreement and annotation guidelines. Additionally, we discuss the possible implications of the recently explored large language models and conclude that despite a potential shift in relevance and difficulty, our described challenge taxonomy remains relevant.

What problem does this paper attempt to address?

The paper attempts to address the core challenges in dialogue summarization and provides a systematic literature review to unify and detail these challenges. Specifically: 1. **Challenges of Dialogue Summarization**: The task of dialogue summarization involves extracting key information from dialogues and generating concise, coherent summaries. Although previous studies have conducted thematic reviews on this topic, there is a lack of comprehensive work that details the core challenges of dialogue summarization, unifies different understandings of the task, and aligns proposed techniques, datasets, and evaluation metrics with these challenges. 2. **Systematic Literature Review**: The paper systematically reviews 1,262 unique research papers published between 2019 and 2024 (based on the Semantic Scholar and DBLP databases), summarizing the research progress on Transformer-based dialogue summarization techniques. The research covers the main challenges of dialogue summarization (such as language, structure, understanding, speaker, salience, and factuality) and their corresponding techniques, such as graph-based methods, additional training tasks, and planning strategies. 3. **Challenge Classification**: The paper proposes a classification system for dialogue summarization challenges (CADS Taxonomy), including six main challenges: language, structure, understanding, speaker, salience, and factuality. These challenges are further subdivided into multiple sub-challenges to better understand and address various issues in the dialogue summarization process. 4. **Datasets and Evaluation Methods**: The paper also explores how to evaluate these methods, including datasets for different dialogue sub-domains (such as meetings, customer service, and healthcare), commonly used automatic evaluation metrics (such as ROUGE), and common human evaluation methods. Through this work, the paper aims to fill the gap in existing research regarding the definition and handling of dialogue summarization challenges and provide directions for future research.

CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization

Taxonomy of Abstractive Dialogue Summarization: Scenarios, Approaches, and Future Directions

Enhancing Abstractive Dialogue Summarization with Internal Knowledge

An Exploratory Study on Long Dialogue Summarization: What Works and What's Next

Human-in-the-loop Abstractive Dialogue Summarization

A Survey on Dialogue Summarization: Recent Advances and New Frontiers

Dialogue acts enhanced extract–abstract framework for meeting summarization

STRUDEL: Structured Dialogue Summarization for Dialogue Comprehension

Summarizing Dialogues with Negative Cues.

DialSummEval: Revisiting Summarization Evaluation for Dialogues

SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

Abstractive summarization: An overview of the state of the art

Abstractive Text Summarization: State of the Art, Challenges, and Improvements

Systematic Exploration of Dialogue Summarization Approaches for Reproducibility, Comparative Assessment, and Methodological Innovations for Advancing Natural Language Processing in Abstractive Summarization

DialogSum Challenge: Summarizing Real-Life Scenario Dialogues.

Restructuring Conversations using Discourse Relations for Zero-shot Abstractive Dialogue Summarization

Abstractive Dialogue Summarization with Sentence-Gated Modeling Optimized by Dialogue Acts

Long Dialog Summarization: An Analysis

Topic-Aware Contrastive Learning for Abstractive Dialogue Summarization

Challenges in Domain-Specific Abstractive Summarization and How to Overcome them