Revisiting the Evaluation of End-to-end Event Extraction.

Shun Zheng,Wei Cao,Wei Xu,Jiang Bian
DOI: https://doi.org/10.18653/v1/2021.findings-acl.405
2021-01-01
Abstract:Event extraction (EE) aims to harvest event instances from plain text, where each instance is composed of a group of event arguments with specific event roles. Existing end-to-end EE research usually adopts the role-averaged evaluation that produces evaluation measures by averaging evaluation statistics of each event role. However, although this averaged metric can indicate the model performance to some extent, we find that such metric can be pretty misleading to downstream applications that utilize an event instance as a whole, where one wrongly identified event argument can substantially alter the whole meaning of an event instance. To mitigate this gap and provide a more complete understanding of performance, we propose two new evaluation metrics that also consider an event instance as a whole and explicitly penalize wrongly identified event arguments. Moreover, to support diverse preferences of evaluation metrics motivated by different scenarios, we propose a new training paradigm based on reinforcement learning for a typical end-to-end EE model, i.e., Doc2EDAG. Our extensive experiments show that the new training improves the initial one by a large margin (about 10%) under new metrics. Nevertheless, the current performance is still far from satisfactory, and optimizing towards these new metrics calls for more future research.
What problem does this paper attempt to address?