Abstract:Objective: This study introduces ChatSchema, an effective method for extracting and structuring information from unstructured data in medical paper reports using a combination of Large Multimodal Models (LMMs) and Optical Character Recognition (OCR) based on the schema. By integrating predefined schema, we intend to enable LMMs to directly extract and standardize information according to the schema specifications, facilitating further data entry. Method: Our approach involves a two-stage process, including classification and extraction for categorizing report scenarios and structuring information. We established and annotated a dataset to verify the effectiveness of ChatSchema, and evaluated key extraction using precision, recall, F1-score, and accuracy metrics. Based on key extraction, we further assessed value extraction. We conducted ablation studies on two LMMs to illustrate the improvement of structured information extraction with different input modals and methods. Result: We analyzed 100 medical reports from Peking University First Hospital and established a ground truth dataset with 2,945 key-value pairs. We evaluated ChatSchema using GPT-4o and Gemini 1.5 Pro and found a higher overall performance of GPT-4o. The results are as follows: For the result of key extraction, key-precision was 98.6%, key-recall was 98.5%, key-F1-score was 98.6%. For the result of value extraction based on correct key extraction, the overall accuracy was 97.2%, precision was 95.8%, recall was 95.8%, and F1-score was 95.8%. An ablation study demonstrated that ChatSchema achieved significantly higher overall accuracy and overall F1-score of key-value extraction, compared to the Baseline, with increases of 26.9% overall accuracy and 27.4% overall F1-score, respectively.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The paper aims to address the problem of extracting and structuring information from medical reports. Specifically, the authors propose a method called **ChatSchema**, which utilizes Large Multimodal Models (LMMs) and Optical Character Recognition (OCR) technology to extract and standardize information from unstructured medical reports based on predefined schemas. Through this method, the researchers hope to improve the efficiency and accuracy of medical report data entry, reducing the workload of manual processing. ### Main Contributions 1. **Method Innovation**: Combines LMMs and OCR technology to guide information extraction and standardization through predefined schemas. 2. **Two-Stage Process**: Includes classification and extraction stages, used to identify report scenarios and extract structured information, respectively. 3. **Dataset Construction**: Established a ground truth dataset containing 100 medical reports, with a total of 2,945 key-value pairs. 4. **Performance Evaluation**: Evaluated the performance of ChatSchema using metrics such as precision, recall, F1 score, and accuracy, and compared it with baseline methods. 5. **Experimental Validation**: Conducted experiments on two LMMs, GPT-4o and Gemini 1.5 Pro, showing that ChatSchema performs excellently in key extraction and value extraction. ### Experimental Results - **Key Extraction**: Particularly outstanding performance on GPT-4o, with a key precision of 98.6%, key recall of 98.5%, and key F1 score of 98.6%. - **Value Extraction**: Overall accuracy based on correct key extraction is 97.2%, with a precision of 95.8%, recall of 95.8%, and F1 score of 95.8%. - **Ablation Study**: Using both images and text did not significantly improve performance compared to using only images or text, but still outperformed baseline methods. ### Conclusion The ChatSchema method performs excellently in extracting and structuring information from medical reports, effectively identifying and standardizing key-value pairs. The research results indicate that combining LMMs and OCR technology, through prompt engineering and predefined schemas, can significantly improve the efficiency of data entry processing for medical documents. Future research can further validate this method on more diverse and larger-scale datasets and explore its applications in other fields and languages.

ChatSchema: A pipeline of extracting structured information with Large Multimodal Models based on schema

TCMChat: A Generative Large Language Model for Traditional Chinese Medicine

A critical assessment of using ChatGPT for extracting structured data from clinical notes

An Entity Extraction Pipeline for Medical Text Records Using Large Language Models: Analytical Study

Large Language Models for Efficient Medical Information Extraction

Critical Care Studies Using Large Language Models Based on Electronic Healthcare Records: A Technical Note

Multi role ChatGPT framework for transforming medical data analysis

Schema Matching with Large Language Models: an Experimental Study

Enhancing Real-World Data Extraction in Clinical Research: Evaluating the Impact of the Implementation of Large Language Models in Hospital Settings

Schema-Driven Information Extraction from Heterogeneous Tables

Extraction and classification of structured data from unstructured hepatobiliary pathology reports using large language models: a feasibility study compared with rules-based natural language processing

ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge

Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports

ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis

From Text to Tables: A Local Privacy Preserving Large Language Model for Structured Information Retrieval from Medical Documents

High-throughput Biomedical Relation Extraction for Semi-Structured Web Articles Empowered by Large Language Models

Enhancing Clinical Data Extraction from Pathology Reports: A Comparative Analysis of Large Language Models

From Unstructured to Structured: Transforming Chatbot Dialogues into Data Mart Schema for Visualization

Effectiveness of ChatGPT in explaining complex medical reports to patients

Integration and Assessment of ChatGPT in Medical Case Reporting: A Multifaceted Approach