CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization

Frederic Kirstein,Jan Philip Wahle,Bela Gipp,Terry Ruas
2024-06-12
Abstract:Abstractive dialogue summarization is the task of distilling conversations into informative and concise summaries. Although reviews have been conducted on this topic, there is a lack of comprehensive work detailing the challenges of dialogue summarization, unifying the differing understanding of the task, and aligning proposed techniques, datasets, and evaluation metrics with the challenges. This article summarizes the research on Transformer-based abstractive summarization for English dialogues by systematically reviewing 1262 unique research papers published between 2019 and 2024, relying on the Semantic Scholar and DBLP databases. We cover the main challenges present in dialog summarization (i.e., language, structure, comprehension, speaker, salience, and factuality) and link them to corresponding techniques such as graph-based approaches, additional training tasks, and planning strategies, which typically overly rely on BART-based encoder-decoder models. We find that while some challenges, like language, have seen considerable progress, mainly due to training methods, others, such as comprehension, factuality, and salience, remain difficult and hold significant research opportunities. We investigate how these approaches are typically assessed, covering the datasets for the subdomains of dialogue (e.g., meeting, medical), the established automatic metrics and human evaluation approaches for assessing scores and annotator agreement. We observe that only a few datasets span across all subdomains. The ROUGE metric is the most used, while human evaluation is frequently reported without sufficient detail on inner-annotator agreement and annotation guidelines. Additionally, we discuss the possible implications of the recently explored large language models and conclude that despite a potential shift in relevance and difficulty, our described challenge taxonomy remains relevant.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the core challenges in dialogue summarization and provides a systematic literature review to unify and detail these challenges. Specifically: 1. **Challenges of Dialogue Summarization**: The task of dialogue summarization involves extracting key information from dialogues and generating concise, coherent summaries. Although previous studies have conducted thematic reviews on this topic, there is a lack of comprehensive work that details the core challenges of dialogue summarization, unifies different understandings of the task, and aligns proposed techniques, datasets, and evaluation metrics with these challenges. 2. **Systematic Literature Review**: The paper systematically reviews 1,262 unique research papers published between 2019 and 2024 (based on the Semantic Scholar and DBLP databases), summarizing the research progress on Transformer-based dialogue summarization techniques. The research covers the main challenges of dialogue summarization (such as language, structure, understanding, speaker, salience, and factuality) and their corresponding techniques, such as graph-based methods, additional training tasks, and planning strategies. 3. **Challenge Classification**: The paper proposes a classification system for dialogue summarization challenges (CADS Taxonomy), including six main challenges: language, structure, understanding, speaker, salience, and factuality. These challenges are further subdivided into multiple sub-challenges to better understand and address various issues in the dialogue summarization process. 4. **Datasets and Evaluation Methods**: The paper also explores how to evaluate these methods, including datasets for different dialogue sub-domains (such as meetings, customer service, and healthcare), commonly used automatic evaluation metrics (such as ROUGE), and common human evaluation methods. Through this work, the paper aims to fill the gap in existing research regarding the definition and handling of dialogue summarization challenges and provide directions for future research.