Evaluation of Reliability, Repeatability, Robustness, and Confidence of GPT-3.5 and GPT-4 on a Radiology Board-style Examination

Satheesh Krishna,Nishaant Bhambra,Robert Bleakney,Rajesh Bhayana
DOI: https://doi.org/10.1148/radiol.232715
IF: 19.7
2024-05-22
Radiology
Abstract:Background ChatGPT (OpenAI) can pass a text-based radiology board-style examination, but its stochasticity and confident language when it is incorrect may limit utility. Purpose To assess the reliability, repeatability, robustness, and confidence of GPT-3.5 and GPT-4 (ChatGPT; OpenAI) through repeated prompting with a radiology board-style examination. Materials and Methods In this exploratory prospective study, 150 radiology board-style multiple-choice text-based questions, previously used to...
radiology, nuclear medicine & medical imaging
What problem does this paper attempt to address?