Abstract:In real life, many dynamic events, such as major disasters and large-scale sports events, evolve continuously over time. Obtaining an overview of these events can help people quickly understand the situation and respond more effectively. This is challenging because the key information of the event is often scattered across multiple documents, involving complex event knowledge understanding and reasoning, which is under-explored in previous work. Therefore, we proposed the Event-Centric Multi-Document Summarization (ECS) task, which aims to generate concise and comprehensive summaries of a given event based on multiple related news documents. Based on this, we constructed the EventSum dataset, which was constructed using Baidu Baike entries and underwent extensive human annotation, to facilitate relevant research. It is the first large scale Chinese multi-document summarization dataset, containing 5,100 events and a total of 57,984 news documents, with an average of 11.4 input news documents and 13,471 characters per event. To ensure data quality and mitigate potential data leakage, we adopted a multi-stage annotation approach for manually labeling the test set. Given the complexity of event-related information, existing metrics struggle to comprehensively assess the quality of generated summaries. We designed specific metrics including Event Recall, Argument Recall, Causal Recall, and Temporal Recall along with corresponding calculation methods for evaluation. We conducted comprehensive experiments on EventSum to evaluate the performance of advanced long-context Large Language Models (LLMs) on this task. Our experimental results indicate that: 1) The event-centric multi-document summarization task remains challenging for existing long-context LLMs; 2) The recall metrics we designed are crucial for evaluating the comprehensiveness of the summary information.

A LARGE-SCALE CHINESE LONG-TEXT EXTRACTIVE SUMMARIZATION CORPUS

LCSTS: A Large Scale Chinese Short Text Summarization Dataset

CNewSum: A Large-scale Chinese News Summarization Dataset with Human-annotated Adequacy and Deducibility Level

CLEEK: A Chinese Long-text Corpus for Entity Linking.

CLTS+: A New Chinese Long Text Summarization Dataset with Abstractive Summaries

Global Encoding for Long Chinese Text Summarization

LSICC: A Large Scale Informal Chinese Corpus

ChineseWebText: Large-scale High-quality Chinese Web Text Extracted with Effective Evaluation Model

CSL: A Large-scale Chinese Scientific Literature Dataset

Long-Document Cross-Lingual Summarization

Integrating Extractive and Abstractive Models for Long Text Summarization

Building Large Chinese Corpus for Spoken Dialogue Research in Specific Domains.

UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation

CNNSum: Exploring Long-Conext Summarization with Large Language Models in Chinese Novels

CNNSum: Exploring Long-Context Summarization with Large Language Models in Chinese Novels

Mining a Large Chinese-English Corpus from Web

Revisiting Cross-Lingual Summarization: A Corpus-based Study and A New Benchmark with Improved Annotation

CNTLS: A Benchmark Dataset for Abstractive or Extractive Chinese Timeline Summarization

EventSum: A Large-Scale Event-Centric Summarization Dataset for Chinese Multi-News Documents

WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations

The BQ Corpus: A Large-scale Domain-specific Chinese Corpus for Sentence Semantic Equivalence Identification.