Assessing the quality of information extraction

Filip Seitl,Tomáš Kovářík,Soheyla Mirshahi,Jan Kryštůfek,Rastislav Dujava,Matúš Ondreička,Herbert Ullrich,Petr Gronat
2024-05-22
Abstract:Advances in large language models have notably enhanced the efficiency of information extraction from unstructured and semi-structured data sources. As these technologies become integral to various applications, establishing an objective measure for the quality of information extraction becomes imperative. However, the scarcity of labeled data presents significant challenges to this endeavor. In this paper, we introduce an automatic framework to assess the quality of the information extraction/retrieval and its completeness. The framework focuses on information extraction in the form of entity and its properties. We discuss how to handle the input/output size limitations of the large language models and analyze their performance when extracting the information. In particular, we introduce scores to evaluate the quality of the extraction and provide an extensive discussion on how to interpret them.
Computation and Language
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the issue of quality assessment in information extraction (IE) tasks using large language models (LLMs). Specifically, the paper focuses on the following aspects: 1. **Lack of Annotated Data**: In many custom tasks, it is challenging to objectively assess the quality of information extraction due to the lack of annotated data suitable for the application scope. 2. **Technical Limitations**: LLMs face some technical limitations when processing long texts, such as context window size restrictions and intermediate information loss, which can affect the efficiency and accuracy of information extraction. 3. **Evaluation Framework**: Existing evaluation methods typically rely on manually annotated data, which is both time-consuming and expensive. Therefore, an automated and general method is needed to assess the quality of information extraction. To address these issues, the paper proposes an automated evaluation framework that creates synthetic ground truth by inserting artificially generated information (referred to as "needles") into documents, enabling the assessment of information extraction quality without manually annotated data. ### Main Contributions 1. **Introduction of MINEA Score**: The paper proposes a new scoring metric—Multiple Injection Needle Extraction Accuracy (MINEA)—to objectively evaluate the quality of information extraction. 2. **Handling Long Texts**: It discusses how to address the technical limitations of LLMs when processing long texts and proposes some improvements, such as segmented extraction and iterative invocation. 3. **Synthetic Ground Truth**: By inserting artificially generated "needles" into documents, a synthetic ground truth is created, allowing for quality assessment without annotated data. ### Application Scenarios The method proposed in the paper is applicable to information extraction tasks in various fields, especially in medical, legal, and business domains, where large-scale annotated data is often lacking. The automated evaluation framework can significantly reduce the time and resources required for expert manual review, improving the efficiency and accuracy of information extraction.