MAVEN-Arg: Completing the Puzzle of All-in-One Event Understanding Dataset with Event Argument Annotation

Xiaozhi Wang,Hao Peng,Yong Guan,Kaisheng Zeng,Jianhui Chen,Lei Hou,Xu Han,Yankai Lin,Zhiyuan Liu,Ruobing Xie,Jie Zhou,Juanzi Li
2024-06-19
Abstract:Understanding events in texts is a core objective of natural language understanding, which requires detecting event occurrences, extracting event arguments, and analyzing inter-event relationships. However, due to the annotation challenges brought by task complexity, a large-scale dataset covering the full process of event understanding has long been absent. In this paper, we introduce MAVEN-Arg, which augments MAVEN datasets with event argument annotations, making the first all-in-one dataset supporting event detection, event argument extraction (EAE), and event relation extraction. As an EAE benchmark, MAVEN-Arg offers three main advantages: (1) a comprehensive schema covering 162 event types and 612 argument roles, all with expert-written definitions and examples; (2) a large data scale, containing 98,591 events and 290,613 arguments obtained with laborious human annotation; (3) the exhaustive annotation supporting all task variants of EAE, which annotates both entity and non-entity event arguments in document level. Experiments indicate that MAVEN-Arg is quite challenging for both fine-tuned EAE models and proprietary large language models (LLMs). Furthermore, to demonstrate the benefits of an all-in-one dataset, we preliminarily explore a potential application, future event prediction, with LLMs. MAVEN-Arg and codes can be obtained from <a class="link-external link-https" href="https://github.com/THU-KEG/MAVEN-Argument" rel="external noopener nofollow">this https URL</a>.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in natural language processing, there is a lack of a large - scale dataset that can comprehensively cover all tasks of event understanding. Specifically, existing datasets either only cover event detection (ED) and event argument extraction (EAE), such as ACE 2005 and TAC KBP; or only involve event relation extraction (ERE), such as RAMS and WikiEvents. Due to the annotation challenges brought by task complexity, these datasets can usually only cover thousands of events, and cannot be used uniformly because of the inconsistency of event patterns and data. To solve these problems, the paper introduces MAVEN - ARG, a new dataset that adds event argument annotation on the basis of the MAVEN dataset, aiming to be the first fully - functional dataset to support event detection, event argument extraction and event relation extraction. The main advantages of MAVEN - ARG include: 1. **Comprehensive event patterns**: It contains 162 event types and 612 parameter roles, all with definitions and examples written by experts. 2. **Large - scale data**: It contains 98,591 events and 290,613 parameters, all obtained through manual annotation. 3. **Exhaustive annotation**: It supports all variant tasks of EAE and annotates entity and non - entity event parameters at the document level. In addition, the experimental results show that even the latest fine - tuned EAE models and large language models (LLMs) perform far from satisfactory on MAVEN - ARG, indicating that this dataset is quite challenging and more research efforts are needed to develop practical EAE methods. To demonstrate the advantages of a fully - functional event - understanding dataset, the paper also preliminarily explores the application of future event prediction, using LLMs to sample causally - related event chains from MAVEN - ARG to predict future event types and parameters.