Abstract:Advances in large language models have notably enhanced the efficiency of information extraction from unstructured and semi-structured data sources. As these technologies become integral to various applications, establishing an objective measure for the quality of information extraction becomes imperative. However, the scarcity of labeled data presents significant challenges to this endeavor. In this paper, we introduce an automatic framework to assess the quality of the information extraction/retrieval and its completeness. The framework focuses on information extraction in the form of entity and its properties. We discuss how to handle the input/output size limitations of the large language models and analyze their performance when extracting the information. In particular, we introduce scores to evaluate the quality of the extraction and provide an extensive discussion on how to interpret them.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the issue of quality assessment in information extraction (IE) tasks using large language models (LLMs). Specifically, the paper focuses on the following aspects: 1. **Lack of Annotated Data**: In many custom tasks, it is challenging to objectively assess the quality of information extraction due to the lack of annotated data suitable for the application scope. 2. **Technical Limitations**: LLMs face some technical limitations when processing long texts, such as context window size restrictions and intermediate information loss, which can affect the efficiency and accuracy of information extraction. 3. **Evaluation Framework**: Existing evaluation methods typically rely on manually annotated data, which is both time-consuming and expensive. Therefore, an automated and general method is needed to assess the quality of information extraction. To address these issues, the paper proposes an automated evaluation framework that creates synthetic ground truth by inserting artificially generated information (referred to as "needles") into documents, enabling the assessment of information extraction quality without manually annotated data. ### Main Contributions 1. **Introduction of MINEA Score**: The paper proposes a new scoring metric—Multiple Injection Needle Extraction Accuracy (MINEA)—to objectively evaluate the quality of information extraction. 2. **Handling Long Texts**: It discusses how to address the technical limitations of LLMs when processing long texts and proposes some improvements, such as segmented extraction and iterative invocation. 3. **Synthetic Ground Truth**: By inserting artificially generated "needles" into documents, a synthetic ground truth is created, allowing for quality assessment without annotated data. ### Application Scenarios The method proposed in the paper is applicable to information extraction tasks in various fields, especially in medical, legal, and business domains, where large-scale annotated data is often lacking. The automated evaluation framework can significantly reduce the time and resources required for expert manual review, improving the efficiency and accuracy of information extraction.

Assessing the quality of information extraction

Structured Entity Extraction Using Large Language Models

Towards Trustable Language Models: Investigating Information Quality of Large Language Models

Evaluating Generative Language Models in Information Extraction as Subjective Question Correction

Schema-Driven Information Extraction from Heterogeneous Tables

Learning to Extract Structured Entities Using Language Models

Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: An Empirical Study

A Coarse-to-Fine Framework for Entity-Relation Joint Extraction.

Assessing the Performance of Chinese Open Source Large Language Models in Information Extraction Tasks

An Effective System for Multi-format Information Extraction

Quality Matters: Evaluating Synthetic Data for Tool-Using LLMs

An Evaluation Framework for Attributed Information Retrieval using Large Language Models

Large Language Models for Generative Information Extraction: A Survey

Business Document Information Extraction: Towards Practical Benchmarks

Cybersecurity Automated Information Extraction Techniques: Drawbacks of Current Methods, and Enhanced Extractors

Research on Information Extraction:A Survey

Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation

Comparison of DNA ploidy status and DNA ploidy-related parameters in malignant melanoma tissue microarrays and full sections.

An Empirical Study on Information Extraction using Large Language Models

ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage