Evaluation of Radiology Residents' Reporting Skills Using Large Language Models: An Observational Study

Natsuko Atsukawa,Hiroyuki Tatekawa,Tatsushi Oura,Shu Matsushita,Daisuke Horiuchi,Hirotaka Takita,Yasuhito Mitsuyama,Ayako Omori,Taro Shimono,Yukio Miki,Daiju Ueda
DOI: https://doi.org/10.1101/2024.11.06.24316838
2024-11-06
Abstract:Background: Large language models (LLMs) have the potential to objectively evaluate radiology resident reports; however, research on their use for feedback in radiology training and assessment of resident skill development remains limited. Purpose: This study aimed to assess the effectiveness of LLMs in revising radiology reports by comparing them with reports verified by board-certified radiologists and to analyze the progression of resident's reporting skills over time. Materials and methods: To identify the LLM that best aligned with human radiologists, 100 reports were randomly selected from a total of 7376 reports authored by nine first-year radiology residents. The reports were evaluated based on six criteria: (1) Addition of missing positive findings, (2) Deletion of findings, (3) Addition of negative findings, (4) Correction of the expression of findings, (5) Correction of the diagnosis, and (6) Proposal of additional examinations or treatments. Reports were segmented into four time-based terms, and 900 reports (450 CT and 450 MRI) were randomly chosen from the initial and final terms of the residents' first year. The revised rates for each criterion were compared between the first and last terms using the Wilcoxon Signed-Rank test. Results: Among the LLMs tested, GPT-4o demonstrated the highest level of agreement with board-certified radiologists. Significant improvements were noted in Criteria 1-3 when comparing reports from the first and last terms (all P < 0.023) using GPT-4o. In contrast, no significant changes were observed for Criteria 4-6. Despite this, all criteria except for Criterion 6 showed progressive enhancement over time. Conclusion: LLMs can effectively provide feedback on commonly corrected areas in radiology reports, enabling residents to objectively identify and improve their weaknesses and monitor their progress. Additionally, LLMs may help reduce the workload of radiologists' mentors.
Radiology and Imaging
What problem does this paper attempt to address?