SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section

Leandro Carísio Fernandes,Gustavo Bartz Guedes,Thiago Soares Laitz,Thales Sales Almeida,Rodrigo Nogueira,Roberto Lotufo,Jayr Pereira
2024-08-29
Abstract:Document summarization is a task to shorten texts into concise and informative summaries. This paper introduces a novel dataset designed for summarizing multiple scientific articles into a section of a survey. Our contributions are: (1) SurveySum, a new dataset addressing the gap in domain-specific summarization tools; (2) two specific pipelines to summarize scientific articles into a section of a survey; and (3) the evaluation of these pipelines using multiple metrics to compare their performance. Our results highlight the importance of high-quality retrieval stages and the impact of different configurations on the quality of generated summaries.
Computation and Language
What problem does this paper attempt to address?
This paper aims to address the application of Multi-Document Summarization (MDS) in generating scientific literature reviews. Specifically, the authors propose a new dataset named **SurveySum** for summarizing multiple scientific articles into parts of a review. The main contributions of the paper include: 1. **SurveySum Dataset**: Filling the gap of domain-specific summarization tools, focusing on generating text for scientific reviews. 2. **Two Summarization Pipelines**: Proposing two specific methods for summarizing scientific articles into review sections. 3. **Evaluation Experiments**: Assessing the performance of these two pipelines through various metrics and comparing their effectiveness. The research results highlight the importance of a high-quality retrieval stage and the impact of different configurations on the quality of the generated summaries. Additionally, the paper emphasizes the influence of the large language models (LLM) used on the final summary quality. Overall, this study aims to advance the field of automatic summarization of scientific literature.