Abstract:Understanding events in texts is a core objective of natural language understanding, which requires detecting event occurrences, extracting event arguments, and analyzing inter-event relationships. However, due to the annotation challenges brought by task complexity, a large-scale dataset covering the full process of event understanding has long been absent. In this paper, we introduce MAVEN-Arg, which augments MAVEN datasets with event argument annotations, making the first all-in-one dataset supporting event detection, event argument extraction (EAE), and event relation extraction. As an EAE benchmark, MAVEN-Arg offers three main advantages: (1) a comprehensive schema covering 162 event types and 612 argument roles, all with expert-written definitions and examples; (2) a large data scale, containing 98,591 events and 290,613 arguments obtained with laborious human annotation; (3) the exhaustive annotation supporting all task variants of EAE, which annotates both entity and non-entity event arguments in document level. Experiments indicate that MAVEN-Arg is quite challenging for both fine-tuned EAE models and proprietary large language models (LLMs). Furthermore, to demonstrate the benefits of an all-in-one dataset, we preliminarily explore a potential application, future event prediction, with LLMs. MAVEN-Arg and codes can be obtained from <a class="link-external link-https" href="https://github.com/THU-KEG/MAVEN-Argument" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in natural language processing, there is a lack of a large - scale dataset that can comprehensively cover all tasks of event understanding. Specifically, existing datasets either only cover event detection (ED) and event argument extraction (EAE), such as ACE 2005 and TAC KBP; or only involve event relation extraction (ERE), such as RAMS and WikiEvents. Due to the annotation challenges brought by task complexity, these datasets can usually only cover thousands of events, and cannot be used uniformly because of the inconsistency of event patterns and data. To solve these problems, the paper introduces MAVEN - ARG, a new dataset that adds event argument annotation on the basis of the MAVEN dataset, aiming to be the first fully - functional dataset to support event detection, event argument extraction and event relation extraction. The main advantages of MAVEN - ARG include: 1. **Comprehensive event patterns**: It contains 162 event types and 612 parameter roles, all with definitions and examples written by experts. 2. **Large - scale data**: It contains 98,591 events and 290,613 parameters, all obtained through manual annotation. 3. **Exhaustive annotation**: It supports all variant tasks of EAE and annotates entity and non - entity event parameters at the document level. In addition, the experimental results show that even the latest fine - tuned EAE models and large language models (LLMs) perform far from satisfactory on MAVEN - ARG, indicating that this dataset is quite challenging and more research efforts are needed to develop practical EAE methods. To demonstrate the advantages of a fully - functional event - understanding dataset, the paper also preliminarily explores the application of future event prediction, using LLMs to sample causally - related event chains from MAVEN - ARG to predict future event types and parameters.

MAVEN-Arg: Completing the Puzzle of All-in-One Event Understanding Dataset with Event Argument Annotation

MAVEN-Fact: A Large-scale Event Factuality Detection Dataset

MAVEN: A Massive General Domain Event Detection Dataset

MAVEN-ERE: A Unified Large-scale Dataset for Event Coreference, Temporal, Causal, and Subevent Relation Extraction

Overview of SMP-CAIL2020-Argmine: the Interactive Argument-Pair Extraction in Judgement Document Challenge

Beyond Single-Event Extraction: Towards Efficient Document-Level Multi-Event Argument Extraction

One Small and One Large for Document-level Event Argument Extraction

Document-Level Event Argument Extraction with Sparse Representation Attention

Revisiting Event Argument Extraction: Can EAE Models Learn Better When Being Aware of Event Co-occurrences?

IAM: A Comprehensive and Large-Scale Dataset for Integrated Argument Mining Tasks

Learning to Ask for Data-Efficient Event Argument Extraction

HMEAE: Hierarchical Modular Event Argument Extraction.

DocEE: A Large-Scale and Fine-grained Benchmark for Document-level Event Extraction

A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction

EA$^2$E: Improving Consistency with Event Awareness for Document-Level Argument Extraction

Which Side Are You On? A Multi-task Dataset for End-to-End Argument Summarisation and Evaluation

AntCritic: Argument Mining for Free-Form and Visually-Rich Financial Comments

OmniEvent: A Comprehensive, Fair, and Easy-to-Use Toolkit for Event Understanding

Towards Event-oriented Long Video Understanding

A Semantic Mention Graph Augmented Model for Document-Level Event Argument Extraction

Document-Level Event Argument Extraction by Conditional Generation