Revisiting Relation Extraction in the era of Large Language Models

Somin Wadhwa,Silvio Amir,Byron C. Wallace
2024-07-16
Abstract:Relation extraction (RE) is the core NLP task of inferring semantic relationships between entities from text. Standard supervised RE techniques entail training modules to tag tokens comprising entity spans and then predict the relationship between them. Recent work has instead treated the problem as a \emph{sequence-to-sequence} task, linearizing relations between entities as target strings to be generated conditioned on the input. Here we push the limits of this approach, using larger language models (GPT-3 and Flan-T5 large) than considered in prior work and evaluating their performance on standard RE tasks under varying levels of supervision. We address issues inherent to evaluating generative approaches to RE by doing human evaluations, in lieu of relying on exact matching. Under this refined evaluation, we find that: (1) Few-shot prompting with GPT-3 achieves near SOTA performance, i.e., roughly equivalent to existing fully supervised models; (2) Flan-T5 is not as capable in the few-shot setting, but supervising and fine-tuning it with Chain-of-Thought (CoT) style explanations (generated via GPT-3) yields SOTA results. We release this model as a new baseline for RE tasks.
Computation and Language
What problem does this paper attempt to address?
The paper primarily aims to address several key issues in the task of Relation Extraction (RE): 1. **Improvement of Evaluation Methods**: The paper discusses the evaluation challenges encountered when using generative models for relation extraction and proposes assessing the consistency between model outputs and reference answers through manual annotation to overcome the inaccuracies caused by strict matching. 2. **Effectiveness of Few-Shot Learning**: The study examines the performance of large-scale language models (such as GPT-3) in relation extraction with a few examples (few-shot) and finds that their performance is close to or even surpasses existing fully supervised models. 3. **Optimization of Flan-T5**: Although Flan-T5 does not perform as well as GPT-3 in few-shot learning, its performance is significantly improved by introducing Chain-of-Thought (CoT) explanations to enhance the supervision signal, achieving the current State-of-the-Art (SOTA) level. Through experimental analysis on different datasets (ADE, CoNLL, NYT, and DocRED), the paper demonstrates the effectiveness and applicability of these methods. Specifically, training Flan-T5 with CoT explanations generated by GPT-3 not only improves the model's performance but also reduces the dependency on large-scale pre-trained models, making it more practical and cost-effective.