Abstract:We propose a general pipeline to automate the extraction of labels from radiology reports using large language models, which we validate on spinal MRI reports. The efficacy of our labelling method is measured on five distinct conditions: spinal cancer, stenosis, spondylolisthesis, cauda equina compression and herniation. Using open-source models, our method equals or surpasses GPT-4 on a held-out set of reports. Furthermore, we show that the extracted labels can be used to train imaging models to classify the identified conditions in the accompanying MR scans. All classifiers trained using automated labels achieve comparable performance to models trained using scans manually annotated by clinicians. Code can be found at <a class="link-external link-https" href="https://github.com/robinyjpark/AutoLabelClassifier" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

This paper attempts to address the issue of time-consuming and expert-dependent annotation of medical imaging datasets. Specifically, the authors propose a general pipeline that leverages large language models (LLMs) to automatically extract labels from radiology reports, reducing the manual annotation workload and unlocking larger-scale training datasets for medical imaging problems. ### Main Issues: 1. **Time-consuming annotation of medical imaging datasets**: Annotating medical imaging datasets typically requires a significant amount of time and the involvement of expert annotators, which is both expensive and limited. 2. **Diverse medical conditions**: Various medical conditions may appear in medical imaging, leading to inconsistency in labels. 3. **Small-scale datasets**: Due to the above reasons, researchers often have to work with small-scale datasets for medical imaging problems or rely on a few publicly available datasets that cover limited conditions and modalities. ### Solution: The authors propose a general approach that adapts general large language models (LLMs) to extract structured labels from clinical reports. The specific steps include: 1. **Model prompting**: By providing definitions of target conditions, the model is asked to generate summaries of the reports and generate binary labels based on the summaries. 2. **Self-supervised fine-tuning**: The model undergoes self-supervised fine-tuning to familiarize it with the task of summary generation. 3. **Application validation**: The method's effectiveness is validated on spine MRI radiology reports, testing for five different conditions: spinal cancer, stenosis, spondylolisthesis, cauda equina compression, and disc herniation. ### Main Contributions: - **Automated label extraction**: This method can automatically extract labels from radiology reports, significantly reducing the manual annotation workload. - **Superior performance**: Using open-source models, this method achieved performance comparable to or better than GPT-4 across multiple conditions. - **Downstream applications**: The extracted labels can be used to train image models to detect relevant conditions, with performance comparable to models trained on expert-annotated images. ### Conclusion: This paper presents a general method for automatically extracting labels from radiology reports without additional model training. The method outperforms a strong GPT-4 baseline in the application to spine MRI reports, offering privacy protection and cost-effectiveness. Additionally, the extracted labels can be used to train classifiers, achieving performance comparable to models trained on expert-annotated scans.

Automated Spinal MRI Labelling from Reports Using a Large Language Model

Automated detection, labelling and radiological grading of clinical spinal MRIs

Deep learning to automate the labelling of head MRI datasets for computer vision applications

Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports

Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods

Fine-Tuning In-House Large Language Models to Infer Differential Diagnosis from Radiology Reports

Automated image label extraction from radiology reports — A review

Semi-Supervised Natural Language Approach for Fine-Grained Classification of Medical Reports

Improving Automating Quality Control in Radiology: Leveraging Large Language Models to Extract Correlative Findings in Radiology and Operative Reports

MRScore: Evaluating Radiology Report Generation with LLM-based Reward System

Programming Chatbots Using Natural Language: Generating Cervical Spine MRI Impressions

DeepSPINE: Automated Lumbar Vertebral Segmentation, Disc-level Designation, and Spinal Stenosis Grading Using Deep Learning

Detection and Labeling of Vertebrae in MR Images Using Deep Learning with Clinical Annotations as Training Data

Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning

Learning to Generate Radiology Findings from Impressions Based on Large Language Model

Simple Words over Rich Imaging: Accurate Brain Disease Classification via Language Model Analysis of Radiological Reports

SELF-SUPERVISED LEARNING WITH RADIOLOGY REPORTS, A COMPARATIVE ANALYSIS OF STRATEGIES FOR LARGE VESSEL OCCLUSION AND BRAIN CTA IMAGES

Leveraging Professional Radiologists' Expertise to Enhance LLMs' Evaluation for Radiology Reports

Large language models for structured reporting in radiology: past, present, and future

Development of a natural language processing algorithm for the detection of spinal metastasis based on magnetic resonance imaging reports

Enhancing chest X-ray datasets with privacy-preserving large language models and multi-type annotations: a data-driven approach for improved classification