Abstract:Document-level Event Argument Extraction (EAE) faces two challenges due to increased input length: 1) difficulty in distinguishing semantic boundaries between events, and 2) interference from redundant information. To address these issues, we propose two methods. The first method introduces the Co and Structure Event Argument Extraction model (CsEAE) based on Small Language Models (SLMs). CsEAE includes a co-occurrences-aware module, which integrates information about all events present in the current input through context labeling and co-occurrences event prompts extraction. Additionally, CsEAE includes a structure-aware module that reduces interference from redundant information by establishing structural relationships between the sentence containing the trigger and other sentences in the document. The second method introduces new prompts to transform the extraction task into a generative task suitable for Large Language Models (LLMs), addressing gaps in EAE performance using LLMs under Supervised Fine-Tuning (SFT) conditions. We also fine-tuned multiple datasets to develop an LLM that performs better across most datasets. Finally, we applied insights from CsEAE to LLMs, achieving further performance improvements. This suggests that reliable insights validated on SLMs are also applicable to LLMs. We tested our models on the Rams, WikiEvents, and MLEE datasets. The CsEAE model achieved improvements of 2.1\%, 2.3\%, and 3.2\% in the Arg-C F1 metric compared to the baseline, PAIE~\cite{PAIE}. For LLMs, we demonstrated that their performance on document-level datasets is comparable to that of SLMs~\footnote{All code is available at <a class="link-external link-https" href="https://github.com/simon-p-j-r/CsEAE" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

This paper attempts to solve two key problems in document - level event argument extraction (EAE): 1. **Difficulty in distinguishing semantic boundaries between events**: As the length of the input text increases, the semantic boundaries between different events become blurred. Especially when multiple events share the same text fragment as an argument, it makes it difficult for the model to accurately distinguish each event. 2. **Interference from redundant information**: Longer documents contain a large amount of information, including not only details useful for the task, but also a lot of irrelevant redundant information. These redundant information will interfere with the model's parameter extraction performance. To solve these two problems, the author proposes two methods: ### Method 1: Co - occurrence and Structure - aware Event Argument Extraction Model (CsEAE) based on Small Language Models (SLMs) - **Co - occurrences - aware module**: By marking trigger words and encoding relevant cues, identify all co - occurring events in the input, helping the model capture the semantic boundaries between events. - **Structure - aware module**: By establishing the structural relationship between trigger sentences and other sentences, reduce the interference of redundant information, enabling the model to focus more on relevant information. ### Method 2: Prompt Design and Supervised Fine - Tuning based on Large Language Models (LLMs) - **New prompt design**: Design specific prompts for each dataset, transforming the extraction task into a generation task suitable for LLMs. - **Supervised Fine - Tuning (SFT)**: By fine - tuning on multiple datasets, make LLMs perform better on document - level EAE tasks. In addition, the author also applies the reliable insights obtained from CsEAE to LLMs, further improving the performance of LLMs. The experimental results show that these improvements have achieved significant performance improvements on multiple benchmark datasets. In summary, this paper aims to improve the accuracy and robustness of document - level event argument extraction by introducing co - occurrence and structure - aware mechanisms and optimizing the prompt design and fine - tuning strategies of LLMs.

One Small and One Large for Document-level Event Argument Extraction

ULTRA: Unleash LLMs' Potential for Event Argument Extraction through Hierarchical Modeling and Pair-wise Refinement

Utilizing Contextual Clues and Role Correlations for Enhancing Document-level Event Argument Extraction

Beyond Single-Event Extraction: Towards Efficient Document-Level Multi-Event Argument Extraction

Revisiting Event Argument Extraction: Can EAE Models Learn Better When Being Aware of Event Co-occurrences?

DocEE: A Large-Scale and Fine-grained Benchmark for Document-level Event Extraction

Learning to Ask for Data-Efficient Event Argument Extraction

Document-Level Event Argument Extraction with Sparse Representation Attention

Prompt for Extraction? PAIE: Prompting Argument Interaction for Event Argument Extraction

Large Language Models for Document-Level Event-Argument Data Augmentation for Challenging Role Types

From Simple to Complex: A Progressive Framework for Document-level Informative Argument Extraction

Incorporating Schema-Aware Description into Document-Level Event Extraction

EA$^2$E: Improving Consistency with Event Awareness for Document-Level Argument Extraction

Event Extraction by Associating Event Types and Argument Roles

A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction

A Semantic Mention Graph Augmented Model for Document-Level Event Argument Extraction

Document-Level Event Argument Extraction by Conditional Generation

Is a Large Language Model a Good Annotator for Event Extraction?

Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models

Joint Event Extraction via Structural Semantic Matching

LAAP: Learning the Argument of An Entity with Event Prompts for document-level event extraction