YAYI-UIE: A Chat-Enhanced Instruction Tuning Framework for Universal Information Extraction

Xinglin Xiao,Yijie Wang,Nan Xu,Yuqi Wang,Hanxuan Yang,Minzheng Wang,Yin Luo,Lei Wang,Wenji Mao,Daniel Zeng

2024-04-02

Abstract:The difficulty of the information extraction task lies in dealing with the task-specific label schemas and heterogeneous data structures. Recent work has proposed methods based on large language models to uniformly model different information extraction tasks. However, these existing methods are deficient in their information extraction capabilities for Chinese languages other than English. In this paper, we propose an end-to-end chat-enhanced instruction tuning framework for universal information extraction (YAYI-UIE), which supports both Chinese and English. Specifically, we utilize dialogue data and information extraction data to enhance the information extraction performance jointly. Experimental results show that our proposed framework achieves state-of-the-art performance on Chinese datasets while also achieving comparable performance on English datasets under both supervised settings and zero-shot settings.

Artificial Intelligence

What problem does this paper attempt to address?

This paper proposes a chat-enhanced command tuning framework called YAYI-UIE for general information extraction, supporting both Chinese and English. The difficulty of information extraction tasks lies in handling label patterns and heterogeneous data structures specific to each task. Existing methods mainly rely on large-scale language models to unify the modeling of different information extraction tasks, but they lack the ability to extract information in non-English languages (especially Chinese). YAYI-UIE addresses this issue through two-step command tuning: firstly, fine-tuning the base language model with dialogue data to obtain a chat model with general understanding ability; secondly, combining the most comprehensive Chinese information extraction benchmark dataset constructed, further enhancing the performance of the chat model on information extraction tasks. Experiments show that YAYI-UIE achieves state-of-the-art performance on Chinese datasets and also performs well on English datasets. In addition, they have created the largest Chinese command tuning benchmark, including 16 datasets from different domains. The paper also compares with other methods such as UIE, USM, and InstructUIE, demonstrating the superiority of YAYI-UIE in both supervised and zero-shot settings, especially in Chinese tasks.

YAYI-UIE: A Chat-Enhanced Instruction Tuning Framework for Universal Information Extraction

InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction.

ChatUIE: Exploring Chat-based Unified Information Extraction using Large Language Models

RUIE: Retrieval-based Unified Information Extraction using Large Language Model

Unified Structure Generation for Universal Information Extraction

UMIE: Unified Multimodal Information Extraction with Instruction Tuning

ChatIE: Zero-Shot Information Extraction via Chatting with ChatGPT

COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning

QuAChIE: Question Answering based Chinese Information Extraction System

OpenUE: an Open Toolkit of Universal Extraction from Text

SHINE: Syntax-augmented Hierarchical Interactive Encoder for Zero-shot Cross-lingual Information Extraction

Diluie: Constructing Diverse Demonstrations of In-Context Learning with Large Language Model for Unified Information Extraction

FSUIE: A Novel Fuzzy Span Mechanism for Universal Information Extraction

A Unified Visual Prompt Tuning Framework with Mixture-of-Experts for Multimodal Information Extraction.

RexUIE: A Recursive Method with Explicit Schema Instructor for Universal Information Extraction

AlignXIE: Improving Multilingual Information Extraction by Cross-Lingual Alignment

Assessing the Performance of Chinese Open Source Large Language Models in Information Extraction Tasks

An Effective System for Multi-format Information Extraction

Multi-Granularity Information Interaction Framework for Incomplete Utterance Rewriting

Retrieval-Augmented Code Generation for Universal Information Extraction

UniEX: An Effective and Efficient Framework for Unified Information Extraction via a Span-extractive Perspective