An Empirical Study on Information Extraction using Large Language Models

Ridong Han,Chaohao Yang,Tao Peng,Prayag Tiwari,Xiang Wan,Lu Liu,Benyou Wang
2024-09-10
Abstract:Human-like large language models (LLMs), especially the most powerful and popular ones in OpenAI's GPT family, have proven to be very helpful for many natural language processing (NLP) related tasks. Therefore, various attempts have been made to apply LLMs to information extraction (IE), which is a fundamental NLP task that involves extracting information from unstructured plain text. To demonstrate the latest representative progress in LLMs' information extraction ability, we assess the information extraction ability of GPT-4 (the latest version of GPT at the time of writing this paper) from four perspectives: Performance, Evaluation Criteria, Robustness, and Error Types. Our results suggest a visible performance gap between GPT-4 and state-of-the-art (SOTA) IE methods. To alleviate this problem, considering the LLMs' human-like characteristics, we propose and analyze the effects of a series of simple prompt-based methods, which can be generalized to other LLMs and NLP tasks. Rich experiments show our methods' effectiveness and some of their remaining issues in improving GPT-4's information extraction ability.
Computation and Language
What problem does this paper attempt to address?
The paper aims to evaluate the performance of large language models (especially GPT-4) in information extraction tasks through empirical research and propose improvement methods to narrow the performance gap with existing best methods. Specifically, the paper evaluates GPT-4's information extraction capabilities from four perspectives: 1. **Performance**: Comparing GPT-4 with existing best methods across multiple datasets. 2. **Evaluation Criteria**: Discussing evaluation methods suitable for large models generating human-like responses. 3. **Robustness**: Analyzing the stability of the model's performance in different scenarios. 4. **Error Types**: Summarizing common types of errors in information extraction tasks. The study finds that although GPT-4 can match or exceed existing best methods in some simple tasks, there is still a significant performance gap in most complex tasks. To address this issue, the authors propose several prompt-based improvement methods that help enhance GPT-4's information extraction capabilities.