YAYI-UIE: A Chat-Enhanced Instruction Tuning Framework for Universal Information Extraction

Xinglin Xiao,Yijie Wang,Nan Xu,Yuqi Wang,Hanxuan Yang,Minzheng Wang,Yin Luo,Lei Wang,Wenji Mao,Daniel Zeng
2024-04-02
Abstract:The difficulty of the information extraction task lies in dealing with the task-specific label schemas and heterogeneous data structures. Recent work has proposed methods based on large language models to uniformly model different information extraction tasks. However, these existing methods are deficient in their information extraction capabilities for Chinese languages other than English. In this paper, we propose an end-to-end chat-enhanced instruction tuning framework for universal information extraction (YAYI-UIE), which supports both Chinese and English. Specifically, we utilize dialogue data and information extraction data to enhance the information extraction performance jointly. Experimental results show that our proposed framework achieves state-of-the-art performance on Chinese datasets while also achieving comparable performance on English datasets under both supervised settings and zero-shot settings.
Artificial Intelligence
What problem does this paper attempt to address?
This paper proposes a chat-enhanced command tuning framework called YAYI-UIE for general information extraction, supporting both Chinese and English. The difficulty of information extraction tasks lies in handling label patterns and heterogeneous data structures specific to each task. Existing methods mainly rely on large-scale language models to unify the modeling of different information extraction tasks, but they lack the ability to extract information in non-English languages (especially Chinese). YAYI-UIE addresses this issue through two-step command tuning: firstly, fine-tuning the base language model with dialogue data to obtain a chat model with general understanding ability; secondly, combining the most comprehensive Chinese information extraction benchmark dataset constructed, further enhancing the performance of the chat model on information extraction tasks. Experiments show that YAYI-UIE achieves state-of-the-art performance on Chinese datasets and also performs well on English datasets. In addition, they have created the largest Chinese command tuning benchmark, including 16 datasets from different domains. The paper also compares with other methods such as UIE, USM, and InstructUIE, demonstrating the superiority of YAYI-UIE in both supervised and zero-shot settings, especially in Chinese tasks.