Are Triggers Needed for Document-Level Event Extraction?

Shaden Shaar,Wayne Chen,Maitreyi Chatterjee,Barry Wang,Wenting Zhao,Claire Cardie
2024-11-13
Abstract:Most existing work on event extraction has focused on sentence-level texts and presumes the identification of a trigger-span -- a word or phrase in the input that evokes the occurrence of an event of interest. Event arguments are then extracted with respect to the trigger. Indeed, triggers are treated as integral to, and trigger detection as an essential component of, event extraction. In this paper, we provide the first investigation of the role of triggers for the more difficult and much less studied task of document-level event extraction. We analyze their usefulness in multiple end-to-end and pipelined neural event extraction models for three document-level event extraction datasets, measuring performance using triggers of varying quality (human-annotated, LLM-generated, keyword-based, and random). Our research shows that trigger effectiveness varies based on the extraction task's characteristics and data quality, with basic, automatically-generated triggers serving as a viable alternative to human-annotated ones. Furthermore, providing detailed event descriptions to the extraction model helps maintain robust performance even when trigger quality degrades. Perhaps surprisingly, we also find that the mere existence of trigger input, even random ones, is important for prompt-based LLM approaches to the task.
Computation and Language
What problem does this paper attempt to address?
### The Problem Addressed by the Paper This paper aims to explore the role of triggers in document-level event extraction tasks. Specifically, most existing event extraction research focuses on sentence-level text and assumes the need to identify a trigger span—i.e., the word or phrase in the input text that triggers the occurrence of the event of interest. Event arguments are then extracted based on the trigger. Triggers are considered a core component of event extraction, and trigger detection is seen as an important step in event extraction. However, for the more complex and less-studied document-level event extraction tasks, the role of triggers has not been fully investigated. Therefore, this paper systematically analyzes the role of triggers in document-level event extraction for the first time. The authors conducted experiments using multiple end-to-end and pipeline neural event extraction models on three document-level event extraction datasets, evaluating the effects of triggers of different qualities (manually annotated, generated by large language models, keyword-based, randomly selected). The study found that the effectiveness of triggers depends on the task characteristics and data quality, and automatically generated low-quality triggers can serve as substitutes for manually annotated triggers. Additionally, providing detailed event descriptions to the extraction model helps maintain robust performance when trigger quality declines. Surprisingly, even using random triggers is important for prompt-based large language model methods. ### Main Research Questions 1. **The role of triggers in document-level event extraction**: Investigate whether triggers are still effective for document-level event extraction tasks. 2. **The effect of triggers of different qualities**: Evaluate the performance of triggers from different sources (e.g., manually annotated, automatically generated) in document-level event extraction. 3. **The impact of triggers on model performance**: Study how the presence or absence of triggers affects the performance of different architectures (end-to-end and pipeline) of event extraction models. ### Research Methods - **Datasets**: Three document-level event extraction datasets were used (MUC, WikiEvents, CMNEE). - **Models**: Four state-of-the-art sequence-to-sequence style document-level event extraction systems (TANL, GTT, DEGREE, GENIE) were studied and compared with prompt-based baseline models (GPT-4 O and GPT-4 O-MINI). - **Trigger Generation**: Triggers were generated from four sources: manually annotated, generated by large language models, keyword-based, randomly selected. ### Experimental Results - **Effectiveness of triggers**: The effectiveness of triggers depends on task characteristics and data quality. Automatically generated low-quality triggers can serve as substitutes for manually annotated triggers. - **Importance of detailed event descriptions**: Providing detailed event descriptions to the model helps maintain robust performance when trigger quality declines. - **Role of random triggers**: Even using random triggers is important for prompt-based large language model methods. ### Conclusion This paper systematically analyzes the role of triggers in document-level event extraction and finds that triggers are still effective in certain cases, especially when data quality is high. Additionally, automatically generated low-quality triggers can serve as substitutes for manually annotated triggers, which is significant for reducing annotation costs. Finally, providing detailed event descriptions to the model helps improve the model's robustness.