Abstract:Recently, knowledge-enhanced methods leveraging auxiliary knowledge graphs have emerged in relation extraction, surpassing traditional text-based approaches. However, to our best knowledge, there is currently no public dataset available that encompasses both evidence sentences and knowledge graphs for knowledge-enhanced relation extraction. To address this gap, we introduce the Knowledge-Enhanced Relation Extraction Dataset (KERED). KERED annotates each sentence with a relational fact, and it provides knowledge context for entities through entity linking. Using our curated dataset, We compared contemporary relation extraction methods under two prevalent task settings: sentence-level and bag-level. The experimental result shows the knowledge graphs provided by KERED can support knowledge-enhanced relation extraction methods. We believe that KERED offers high-quality relation extraction datasets with corresponding knowledge graphs for evaluating the performance of knowledge-enhanced relation extraction methods. Our dataset is available at: \url{<a class="link-external link-https" href="https://figshare.com/projects/KERED/134459" rel="external noopener nofollow">this https URL</a>}

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the lack of a public dataset that combines knowledge graphs (KG) in the existing relation extraction (RE) tasks. Specifically: 1. **Problem Background**: - Existing knowledge - enhanced relation extraction methods that utilize auxiliary knowledge graphs have surpassed traditional text - based methods. - However, currently, there is no public dataset that contains both evidence sentences and knowledge graphs for training and evaluating knowledge - enhanced relation extraction methods. 2. **Specific Problems**: - The lack of a standardized benchmark dataset makes it difficult for researchers to report reproducible results or compare the performance of existing methods. - Previous researchers usually need to construct auxiliary knowledge graphs by themselves, create datasets, and retest previous benchmarks for a fair comparison. 3. **Solutions**: - The paper introduces the "Knowledge - Enhanced Relation Extraction Dataset" (KERED), aiming to fill this gap. - KERED improves three widely - used RE datasets (NYT10m, Wiki20m, and Wiki80) and constructs auxiliary knowledge graphs for these datasets. - Through entity linking and data refinement, KERED provides high - quality relation extraction datasets and their corresponding KGs to evaluate the performance of knowledge - enhanced relation extraction methods. 4. **Contributions**: - Developed KERED, including three challenging RE datasets and their auxiliary KGs, which is expected to promote the development of knowledge - enhanced relation extraction research. - Established evaluation metrics for knowledge - enhanced relation extraction methods on KERED and used these datasets to evaluate the state - of - the - art RE methods. - Experimental results show that the information from the auxiliary KG has a positive impact on relation extraction methods. ### Formula Explanation The formulas involved in the paper are mainly used to evaluate experimental results, ensuring the correctness and readability of the formulas. The following are the key formulas: - **Micro F1**: \[ F1=\frac{2\times\text{precision}\times\text{recall}}{\text{precision}+\text{recall}} \] where, \[ \text{precision}=\frac{TP}{TP + FP},\quad\text{recall}=\frac{TP}{TP + FN} \] \(TP\) is the global true positive rate, \(FP\) is the global false positive rate, and \(FN\) is the global false negative rate. - **Micro AP (Average Precision)**: \[ AP=\sum_{i = 2}^{n}\text{precision}_i\times(\text{recall}_i-\text{recall}_{i - 1}) \] where, \(\text{precision}_i\) and \(\text{recall}_i\) represent the global precision and recall rate at the \(i\) - th threshold respectively, and \(n\) represents the total number of samples. Through these improvements and evaluations, the paper provides important resources and benchmarks for knowledge - enhanced relation extraction, promoting further development in this field.

Knowledge-Enhanced Relation Extraction Dataset

CodRED - A Cross-Document Relation Extraction Dataset for Acquiring Knowledge in the Wild.

Rescue Implicit and Long-tail Cases: Nearest Neighbor Relation Extraction

Long-tail Relation Extraction via Knowledge Graph Embeddings and Graph Convolution Networks.

KICE: A Knowledge Consolidation and Expansion Framework for Relation Extraction.

Towards Realistic Low-resource Relation Extraction: A Benchmark with Empirical Baseline Study

Docred: A Large-Scale Document-Level Relation Extraction Dataset

MAVEN-ERE: A Unified Large-scale Dataset for Event Coreference, Temporal, Causal, and Subevent Relation Extraction

Knowledge-Enhanced Relation Extraction for Chinese EMRs

HacRED: A Large-Scale Relation Extraction Dataset Toward Hard Cases in Practical Applications.

EDeR: A Dataset for Exploring Dependency Relations Between Events

Label-Free Distant Supervision for Relation Extraction via Knowledge Graph Embedding.

CrossRE: A Cross-Domain Dataset for Relation Extraction

REKER: Relation Extraction with Knowledge of Entity and Relation.

GEGA: Graph Convolutional Networks and Evidence Retrieval Guided Attention for Enhanced Document-level Relation Extraction

Knowledge-Aware And Retrieval-Based Models For Distantly Supervised Relation Extraction

DocRED-FE: A Document-Level Fine-Grained Entity And Relation Extraction Dataset

Enhancing cross-evidence reasoning graph for document-level relation extraction

Knowledge-Driven Cross-Document Relation Extraction

Evidence-aware Document-level Relation Extraction

BioRED: a rich biomedical relation extraction dataset