Abstract:Background The specialization and complexity of radiology makes the automatic generation of radiologic impressions (ie, a diagnosis with differential diagnosis and management recommendations) challenging. Purpose To develop a large language model (LLM) that generates impressions based on imaging findings and to evaluate its performance in professional and linguistic dimensions. Materials and Methods Six radiologists recorded imaging examination findings from August 2 to 31, 2023, at Shanghai General Hospital and used the developed LLM before routinely writing report impressions for multiple radiologic modalities (CT, MRI, radiography, mammography) and anatomic sites (cranium and face, neck, chest, upper abdomen, lower abdomen, vessels, bone and joint, spine, breast), making necessary corrections and completing the radiologic impression. A subset was defined to investigate cases where the LLM-generated impressions differed from the final radiologist impressions by excluding identical and highly similar cases. An expert panel scored the LLM-generated impressions on a five-point Likert scale (5 = strongly agree) based on scientific terminology, coherence, specific diagnosis, differential diagnosis, management recommendations, correctness, comprehensiveness, harmlessness, and lack of bias. Results In this retrospective study, an LLM was pretrained using 20 GB of medical and general-purpose text data. The fine-tuning data set comprised 1.5 GB of data, including 800 radiology reports with paired instructions (describing the output task in natural language) and outputs. Test set 2 included data from 3988 patients (median age, 56 years [IQR, 40-68 years]; 2159 male). The median recall, precision, and F1 score of LLM-generated impressions were 0.775 (IQR, 0.56-1), 0.84 (IQR, 0.611-1), and 0.772 (IQR, 0.578-0.957), respectively, using the final impressions as the reference standard. In a subset of 1014 patients (median age, 57 years [IQR, 42-69 years]; 528 male), the overall median expert panel score for LLM-generated impressions was 5 (IQR, 5-5), ranging from 4 (IQR, 3-5) to 5 (IQR, 5-5). Conclusion The developed LLM generated radiologic impressions that were professionally and linguistically appropriate for a full spectrum of radiology examinations. © RSNA, 2024 Supplemental material is available for this article.

Patient Centric Summarization of Radiology Findings using Large Language Models

Evaluation of large language models performance against humans for summarizing MRI knee radiology reports: A feasibility study

AI-Assisted Summarization of Radiologic Reports: Evaluating GPT3davinci, BARTcnn, LongT5booksum, LEDbooksum, LEDlegal, and LEDclinical

Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary

Synoptic Reporting by Summarizing Cancer Pathology Reports using Large Language Models

The current status of large language models in summarizing radiology report impressions

A Comparative Study of Recent Large Language Models on Generating Hospital Discharge Summaries for Lung Cancer Patients

Multilingual Natural Language Processing Model for Radiology Reports -- The Summary is all you need!

Enhanced Electronic Health Records Text Summarization Using Large Language Models

Constructing a Large Language Model to Generate Impressions from Findings in Radiology Reports

Adapting Large Language Models for Automated Summarisation of Electronic Medical Records in Clinical Coding

Translating musculoskeletal radiology reports into patient-friendly summaries using ChatGPT-4

Multi-modal large language models in radiology: principles, applications, and potential

Sexual hormone fluctuation in chinchillas.

Adapted large language models can outperform medical experts in clinical text summarization

Evaluating Large Language Models for Radiology Natural Language Processing

Learning to Generate Radiology Findings from Impressions Based on Large Language Model

Evaluation of Large Language Models for Summarization Tasks in the Medical Domain: A Narrative Review

Coarctation co-existing with tetralogy of Fallot and pulmonary atresia

XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models

Using large language models for safety-related table summarization in clinical study reports