Comparing Large Language Model and Human Reader Accuracy with New England Journal of Medicine Image Challenge Case Image Inputs

Pae Sun Suh,Woo Hyun Shim,Chong Hyun Suh,Hwon Heo,Kye Jin Park,Pyeong Hwa Kim,Se Jin Choi,Yura Ahn,Sohee Park,Ho Young Park,Na Eun Oh,Min Woo Han,Sung Tan Cho,Chang-Yun Woo,Hyungjun Park
DOI: https://doi.org/10.1148/radiol.241668
IF: 19.7
2024-12-11
Radiology
Abstract:Background Application of multimodal large language models (LLMs) with both textual and visual capabilities has been steadily increasing, but their ability to interpret radiologic images is still doubted. Purpose To evaluate the accuracy of LLMs and compare it with that of human readers with varying levels of experience and to assess the factors affecting LLM accuracy in answering New England Journal of Medicine Image Challenge cases. Materials and Methods Radiologic images of cases from October...
radiology, nuclear medicine & medical imaging
What problem does this paper attempt to address?