CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling

Jawook Gu,Han-Cheol Cho,Jiho Kim,Kihyun You,Eun Kyoung Hong,Byungseok Roh
2024-01-21
Abstract:Free-text radiology reports present a rich data source for various medical tasks, but effectively labeling these texts remains challenging. Traditional rule-based labeling methods fall short of capturing the nuances of diverse free-text patterns. Moreover, models using expert-annotated data are limited by data scarcity and pre-defined classes, impacting their performance, flexibility and scalability. To address these issues, our study offers three main contributions: 1) We demonstrate the potential of GPT as an adept labeler using carefully designed prompts. 2) Utilizing only the data labeled by GPT, we trained a BERT-based labeler, CheX-GPT, which operates faster and more efficiently than its GPT counterpart. 3) To benchmark labeler performance, we introduced a publicly available expert-annotated test set, MIMIC-500, comprising 500 cases from the MIMIC validation set. Our findings demonstrate that CheX-GPT not only excels in labeling accuracy over existing models, but also showcases superior efficiency, flexibility, and scalability, supported by our introduction of the MIMIC-500 dataset for robust benchmarking. Code and models are available at
Computation and Language,Information Retrieval
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the deficiencies of the existing methods for annotating chest X - ray reports. Specifically: 1. **Traditional rule - based methods**: These methods are unable to capture the subtle differences in free - text and are vulnerable to spelling mistakes and ambiguities, as shown in Table 1(a). 2. **Models based on expert - annotated data**: Due to data scarcity and predefined category limitations, these models affect their performance, flexibility, and extensibility, as shown in Table 1(b). 3. **Limitations of existing annotators**: Existing annotators mainly target specific predefined radiological findings. Modifying or adding new categories requires creating new rules or additional manual annotations, which is particularly difficult when faced with hundreds of major radiological findings. To solve these problems, this research makes the following three main contributions: 1. **Demonstrate the potential of GPT as an efficient annotator**: Through carefully designed prompts, GPT can effectively annotate chest X - ray reports. 2. **Train the BERT - based CheX - GPT model**: Using only data annotated by GPT, CheX - GPT is superior to GPT in terms of speed and efficiency while maintaining high annotation accuracy. 3. **Introduce the publicly available expert - annotated test set MIMIC - 500**: This test set contains 500 cases selected from the MIMIC validation set and is used for benchmarking annotator performance. Through these contributions, this research not only improves annotation accuracy but also demonstrates higher efficiency, flexibility, and extensibility, providing an important foundation for future research on chest X - ray report annotation.