Abstract:Background: The introduction of ChatGPT-4V's 'Chat with images' feature represents the beginning of the era of large multimodal models (LMMs), which allows ChatGPT to process and answer questions based on uploaded images. This advancement has the potential to transform how surgical teams utilize radiographic data, as radiological interpretation is crucial for surgical planning and postoperative care. However, a comprehensive evaluation of ChatGPT-4V's capabilities in interpret radiological images and formulating treatment plans remains to be explored. Patients and methods: Three types of questions were collected: (1) 87 USMLE-style questions, submitting only the question stems and images without providing options to assess ChatGPT's diagnostic capability. For questions involving treatment plan formulations, a five-point Likert scale was used to assess ChatGPT's proposed treatment plan. The 87 questions were then adapted by removing detailed patient history to assess its contribution to diagnosis. The diagnostic performance of ChatGPT-4V was also tested when only medical history was provided. (2) We randomly selected 100 chest radiography from the ChestX-ray8 database to test the ability of ChatGPT-4V to identify abnormal chest radiography. (3) Cases from the 'Diagnose Please' section in the Radiology journal were collected to evaluate the performance of ChatGPT-4V in diagnosing complex cases. Three responses were collected for each question. Results: ChatGPT-4V achieved a diagnostic accuracy of 77.01% for USMLE-style questions. The average score of ChatGPT-4V's treatment plans was 3.97 (Interquartile Range: 3.33-4.67). Removing detailed patient history dropped the diagnostic accuracy to 19.54% (P<0.0001). ChatGPT-4V achieved an AUC of 0.768 (95% CI: 0.684-0.851) in detecting abnormalities in chest radiography, but could not specify the exact disease due to the lack of detailed patient history. For cases from 'Diagnose Please' ChatGPT provided diagnoses consistent with or very similar to the reference answers. Conclusion: ChatGPT-4V demonstrated an impressive ability to combine patient history with radiological images to make diagnoses and directly design treatment plans based on images, suggesting its potential for future application in clinical practice.

XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models

Radiology-GPT: A Large Language Model for Radiology

LiteGPT: Large Vision-Language Model for Joint Chest X-ray Localization and Classification Task

MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis

Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation

RoentGen: Vision-Language Foundation Model for Chest X-ray Generation

Deep neural models for automated multi-task diagnostic scan management—quality enhancement, view classification and report generation

CXR-Agent: Vision-language models for chest X-ray interpretation with uncertainty aware radiology reporting

Enhancing Human-Computer Interaction in Chest X-ray Analysis using Vision and Language Model with Eye Gaze Patterns

MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation

A vision–language foundation model for the generation of realistic chest X-ray images

Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation

D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions

CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images

Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports

Step into the era of large multimodal models: a pilot study on ChatGPT-4V(ision)'s ability to interpret radiological images

Patient Centric Summarization of Radiology Findings using Large Language Models

3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models

MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis

M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation

RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis