Assessing GPT-4 Multimodal Performance in Radiological Image Analysis

Dana Brin,Vera Sorin,Yiftach Barash,Eli Konen,Benjamin S Glicksberg,Girish Nadkarni,Eyal Klang
DOI: https://doi.org/10.1101/2023.11.15.23298583
2024-05-23
Abstract:Objectives: This study aims to assess the performance of OpenAI's multimodal GPT-4, which can analyze both images and textual data (GPT-4V), in interpreting radiological images. It focuses on a range of modalities, anatomical regions, and pathologies to explore the potential of zero-shot generative-AI in enhancing diagnostic processes in radiology. Methods: We analyzed 230 anonymized emergency room diagnostic images, consecutively collected over one week, using GPT-4V. Modalities included ultrasound (US), computerized tomography (CT) and X-ray images. The interpretations provided by GPT-4V were then compared with those of senior radiologists. This comparison aimed to evaluate the accuracy of GPT-4V in recognizing the imaging modality, anatomical region, and pathology present in the images. Results: GPT-4V identified the imaging modality correctly in 100% of cases (221/221), the anatomical region in 87.1% (189/217), and the pathology in 35.2% (76/216). However, the model's performance varied significantly across different modalities, with anatomical region identification accuracy ranging from 60.9% (39/64) in US images to 97% (98/101) and 100% (52/52) in CT and X-ray images (p<0.001). Similarly, Pathology identification ranged from 9.1% (6/66) in US images to 36.4% (36/99) in CT and 66.7% (34/51) for X-ray images (p <0.001). These variations indicate inconsistencies in GPT-4V's ability to interpret radiological images accurately. Conclusion: While the integration of AI in radiology, exemplified by multimodal GPT-4, offers promising avenues for diagnostic enhancement, the current capabilities of GPT-4V are not yet reliable for interpreting radiological images. This study underscores the necessity for ongoing development to achieve dependable performance in radiology diagnostics.
What problem does this paper attempt to address?