BI-RADS Category Assignments by GPT-3.5, GPT-4, and Google Bard: A Multilanguage Study

Andrea Cozzi,Katja Pinker,Andri Hidber,Tianyu Zhang,Luca Bonomo,Roberto Lo Gullo,Blake Christianson,Marco Curti,Stefania Rizzo,Filippo Del Grande,Ritse M. Mann,Simone Schiaffino,Ariane Panzer
DOI: https://doi.org/10.1148/radiol.232133
IF: 19.7
2024-05-01
Radiology
Abstract:Background The performance of publicly available large language models (LLMs) remains unclear for complex clinical tasks. Purpose To evaluate the agreement between human readers and LLMs for Breast Imaging Reporting and Data System (BI-RADS) categories assigned based on breast imaging reports written in three languages and to assess the impact of discordant category assignments on clinical management. Materials and Methods This retrospective study included reports for women who underwent MRI,...
radiology, nuclear medicine & medical imaging
What problem does this paper attempt to address?