HacRED: A Large-Scale Relation Extraction Dataset Toward Hard Cases in Practical Applications.

Qiao Cheng,Juntao Liu,Xiaoye Qu,Jin Zhao,Jiaqing Liang,Zhefeng Wang,Baoxing Huai,Nicholas Jing Yuan,Yanghua Xiao
DOI: https://doi.org/10.18653/v1/2021.findings-acl.249
2021-01-01
Abstract:Relation extraction (RE) is an essential topic in natural language processing and has attracted extensive attention. Current RE approaches achieve fantastic results on common datasets, while they still struggle on practical applications. In this paper, we analyze the above performance gap, the underlying reason of which is that practical applications intrinsically have more hard cases. To make RE models more robust on such practical hard cases, we propose a case-oriented construction framework to build a Hard Case Relation Extraction Dataset (HacRED). The proposed HacRED consists of 65,225 relational facts annotated from 9,231 documents with sufficient and diverse hard cases. Notably, HacRED is one of the largest Chinese document-level RE datasets and achieves a high 96% F1 score on data quality. Furthermore, we apply the state-of-the-art RE models on this dataset and conduct a thorough evaluation. The results show that the performance of these models is far lower than humans, and RE applying on practical hard cases still requires further efforts. HacRED is publicly available at https://github.com/qiaojiim/HacRED.
What problem does this paper attempt to address?