Fine-Tuning In-House Large Language Models to Infer Differential Diagnosis from Radiology Reports

Luoyao Chen,Revant Teotia,Antonio Verdone,Aidan Cardall,Lakshay Tyagi,Yiqiu Shen,Sumit Chopra
2024-10-12
Abstract:Radiology reports summarize key findings and differential diagnoses derived from medical imaging examinations. The extraction of differential diagnoses is crucial for downstream tasks, including patient management and treatment planning. However, the unstructured nature of these reports, characterized by diverse linguistic styles and inconsistent formatting, presents significant challenges. Although proprietary large language models (LLMs) such as GPT-4 can effectively retrieve clinical information, their use is limited in practice by high costs and concerns over the privacy of protected health information (PHI). This study introduces a pipeline for developing in-house LLMs tailored to identify differential diagnoses from radiology reports. We first utilize GPT-4 to create 31,056 labeled reports, then fine-tune open source LLM using this dataset. Evaluated on a set of 1,067 reports annotated by clinicians, the proposed model achieves an average F1 score of 92.1\%, which is on par with GPT-4 (90.8\%). Through this study, we provide a methodology for constructing in-house LLMs that: match the performance of GPT, reduce dependence on expensive proprietary models, and enhance the privacy and security of PHI.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the automatic extraction of differential diagnosis from radiology reports. Radiology reports usually contain key findings from medical imaging examinations and a list of possible diagnoses, and this information is crucial for subsequent patient management and treatment planning. However, due to the unstructured nature of these reports, such as diverse language styles and inconsistent formats, extracting differential diagnosis from the reports poses significant challenges. In addition, although existing large - language models (such as GPT - 4) perform well in clinical information retrieval tasks, their high cost and concerns about the privacy of protected health information (PHI) limit their practical applications. For this reason, this study proposes a method for developing an in - house large - language model (LLM) to identify differential diagnosis in radiology reports. Specifically, the research team first used GPT - 4 to generate 31,056 labeled reports and then fine - tuned the open - source LLM with these data. The experimental results show that the proposed model achieved an average F1 score of 92.1% on 1,067 reports annotated by clinicians, which is comparable to the performance of GPT - 4 (90.8%). Through this method, the study provides a framework for constructing an in - house LLM, which can not only match the performance of GPT, reduce dependence on expensive proprietary models, but also enhance the security and privacy protection of PHI.