ChatSchema: A pipeline of extracting structured information with Large Multimodal Models based on schema

Fei Wang,Yuewen Zheng,Qin Li,Jingyi Wu,Pengfei Li,Luxia Zhang
2024-07-26
Abstract:Objective: This study introduces ChatSchema, an effective method for extracting and structuring information from unstructured data in medical paper reports using a combination of Large Multimodal Models (LMMs) and Optical Character Recognition (OCR) based on the schema. By integrating predefined schema, we intend to enable LMMs to directly extract and standardize information according to the schema specifications, facilitating further data entry. Method: Our approach involves a two-stage process, including classification and extraction for categorizing report scenarios and structuring information. We established and annotated a dataset to verify the effectiveness of ChatSchema, and evaluated key extraction using precision, recall, F1-score, and accuracy metrics. Based on key extraction, we further assessed value extraction. We conducted ablation studies on two LMMs to illustrate the improvement of structured information extraction with different input modals and methods. Result: We analyzed 100 medical reports from Peking University First Hospital and established a ground truth dataset with 2,945 key-value pairs. We evaluated ChatSchema using GPT-4o and Gemini 1.5 Pro and found a higher overall performance of GPT-4o. The results are as follows: For the result of key extraction, key-precision was 98.6%, key-recall was 98.5%, key-F1-score was 98.6%. For the result of value extraction based on correct key extraction, the overall accuracy was 97.2%, precision was 95.8%, recall was 95.8%, and F1-score was 95.8%. An ablation study demonstrated that ChatSchema achieved significantly higher overall accuracy and overall F1-score of key-value extraction, compared to the Baseline, with increases of 26.9% overall accuracy and 27.4% overall F1-score, respectively.
Computation and Language
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper aims to address the problem of extracting and structuring information from medical reports. Specifically, the authors propose a method called **ChatSchema**, which utilizes Large Multimodal Models (LMMs) and Optical Character Recognition (OCR) technology to extract and standardize information from unstructured medical reports based on predefined schemas. Through this method, the researchers hope to improve the efficiency and accuracy of medical report data entry, reducing the workload of manual processing. ### Main Contributions 1. **Method Innovation**: Combines LMMs and OCR technology to guide information extraction and standardization through predefined schemas. 2. **Two-Stage Process**: Includes classification and extraction stages, used to identify report scenarios and extract structured information, respectively. 3. **Dataset Construction**: Established a ground truth dataset containing 100 medical reports, with a total of 2,945 key-value pairs. 4. **Performance Evaluation**: Evaluated the performance of ChatSchema using metrics such as precision, recall, F1 score, and accuracy, and compared it with baseline methods. 5. **Experimental Validation**: Conducted experiments on two LMMs, GPT-4o and Gemini 1.5 Pro, showing that ChatSchema performs excellently in key extraction and value extraction. ### Experimental Results - **Key Extraction**: Particularly outstanding performance on GPT-4o, with a key precision of 98.6%, key recall of 98.5%, and key F1 score of 98.6%. - **Value Extraction**: Overall accuracy based on correct key extraction is 97.2%, with a precision of 95.8%, recall of 95.8%, and F1 score of 95.8%. - **Ablation Study**: Using both images and text did not significantly improve performance compared to using only images or text, but still outperformed baseline methods. ### Conclusion The ChatSchema method performs excellently in extracting and structuring information from medical reports, effectively identifying and standardizing key-value pairs. The research results indicate that combining LMMs and OCR technology, through prompt engineering and predefined schemas, can significantly improve the efficiency of data entry processing for medical documents. Future research can further validate this method on more diverse and larger-scale datasets and explore its applications in other fields and languages.