Transforming free-text radiology reports into structured reports using ChatGPT: A study on thyroid ultrasonography
Huan Jiang,ShuJun Xia,YiXuan Yang,JiaLe Xu,Qing Hua,ZiHan Mei,YiQing Hou,MinYan Wei,LiMei Lai,Ning Li,YiJie Dong,JianQiao Zhou,Jiale Xu,Zihan Mei,Yiqing Hou,Minyan Wei,Yijie Dong
DOI: https://doi.org/10.1016/j.ejrad.2024.111458
IF: 4.531
2024-04-11
European Journal of Radiology
Abstract:Purpose The importance of structured radiology reports has been fully recognized, as they facilitate efficient data extraction and promote collaboration among healthcare professionals. Our purpose is to assess the accuracy and reproducibility of ChatGPT, a large language model, in generating structured thyroid ultrasound reports. Methods This is a retrospective study that includes 184 nodules in 136 thyroid ultrasound reports from 136 patients. ChatGPT-3.5 and ChatGPT-4.0 were used to structure the reports based on ACR-TIRADS guidelines. Two radiologists evaluated the responses for quality, nodule categorization accuracy, and management recommendations. Each text was submitted twice to assess the consistency of the nodule classification and management recommendations. Results On 136 ultrasound reports from 136 patients (mean age, 52 years ± 12 [SD]; 61 male), ChatGPT-3.5 generated 202 satisfactory structured reports, while ChatGPT-4.0 only produced 69 satisfactory structured reports (74.3 % vs. 25.4 %, odds ratio (OR) = 8.490, 95 %CI: 5.775–12.481, p < 0.001). ChatGPT-4.0 outperformed ChatGPT-3.5 in categorizing thyroid nodules, with an accuracy of 69.3 % compared to 34.5 % (OR = 4.282, 95 %CI: 3.145–5.831, p < 0.001). ChatGPT-4.0 also provided more comprehensive or correct management recommendations than ChatGPT-3.5 (OR = 1.791, 95 %CI: 1.297–2.473, p < 0.001). Finally, ChatGPT-4.0 exhibits higher consistency in categorizing nodules compared to ChatGPT-3.5 (ICC = 0.732 vs. ICC = 0.429), and both exhibited moderate consistency in management recommendations (ICC = 0.549 vs ICC = 0.575). Conclusions Our study demonstrates the potential of ChatGPT in transforming free-text thyroid ultrasound reports into structured formats. ChatGPT-3.5 excels in generating structured reports, while ChatGPT-4.0 shows superior accuracy in nodule categorization and management recommendations.
radiology, nuclear medicine & medical imaging