Keeping Up With ChatGPT

Julian M.M. Rogasch,Hans V. Jochens,Giulia Metzger,Christoph Wetz,Jonas Kaufmann,Christian Furth,Holger Amthauer,Imke Schatka
DOI: https://doi.org/10.1097/rlu.0000000000005207
IF: 10.6
2024-04-01
Clinical Nuclear Medicine
Abstract:Purpose The latest iteration of GPT4 (generative pretrained transformer) is a large multimodal model that can integrate both text and image input, but its performance with medical images has not been systematically evaluated. We studied whether ChatGPT with GPT-4V(ision) can recognize images from common nuclear medicine examinations and interpret them. Patients and Methods Fifteen representative images (scintigraphy, 11; PET, 4) were submitted to ChatGPT with GPT-4V(ision), both in its Default and “Advanced Data Analysis (beta)” version. ChatGPT was asked to name the type of examination and tracer, explain the findings and whether there are abnormalities. ChatGPT should also mark anatomical structures or pathological findings. The appropriateness of the responses was rated by 3 nuclear medicine physicians. Results The Default version identified the examination and the tracer correctly in the majority of the 15 cases (60% or 53%) and gave an “appropriate” description of the findings or abnormalities in 47% or 33% of cases, respectively. The Default version cannot manipulate images. “Advanced Data Analysis (beta)” failed in all tasks in >90% of cases. A “major” or “incompatible” inconsistency between 3 trials of the same prompt was observed in 73% (Default version) or 87% of cases (“Advanced Data Analysis (beta)” version). Conclusions Although GPT-4V(ision) demonstrates preliminary capabilities in analyzing nuclear medicine images, it exhibits significant limitations, particularly in its reliability (ie, correctness, predictability, and consistency).
radiology, nuclear medicine & medical imaging
What problem does this paper attempt to address?