Abstract:Importance Ophthalmology is reliant on effective interpretation of multimodal imaging to ensure diagnostic accuracy. The new ability of ChatGPT-4 (OpenAI) to interpret ophthalmic images has not yet been explored. Objective To evaluate the performance of the novel release of an artificial intelligence chatbot that is capable of processing imaging data. Design, Setting, and Participants This cross-sectional study used a publicly available dataset of ophthalmic cases from OCTCases, a medical education platform based out of the Department of Ophthalmology and Vision Sciences at the University of Toronto, with accompanying clinical multimodal imaging and multiple-choice questions. Across 137 available cases, 136 contained multiple-choice questions (99%). Exposures The chatbot answered questions requiring multimodal input from October 16 to October 23, 2023. Main Outcomes and Measures The primary outcome was the accuracy of the chatbot in answering multiple-choice questions pertaining to image recognition in ophthalmic cases, measured as the proportion of correct responses. χ 2 Tests were conducted to compare the proportion of correct responses across different ophthalmic subspecialties. Results A total of 429 multiple-choice questions from 136 ophthalmic cases and 448 images were included in the analysis. The chatbot answered 299 of multiple-choice questions correctly across all cases (70%). The chatbot’s performance was better on retina questions than neuro-ophthalmology questions (77% vs 58%; difference = 18%; 95% CI, 7.5%-29.4%; χ 2 1 = 11.4; P &lt; .001). The chatbot achieved a better performance on nonimage–based questions compared with image-based questions (82% vs 65%; difference = 17%; 95% CI, 7.8%-25.1%; χ 2 1 = 12.2; P &lt; .001).The chatbot performed best on questions in the retina category (77% correct) and poorest in the neuro-ophthalmology category (58% correct). The chatbot demonstrated intermediate performance on questions from the ocular oncology (72% correct), pediatric ophthalmology (68% correct), uveitis (67% correct), and glaucoma (61% correct) categories. Conclusions and Relevance In this study, the recent version of the chatbot accurately responded to approximately two-thirds of multiple-choice questions pertaining to ophthalmic cases based on imaging interpretation. The multimodal chatbot performed better on questions that did not rely on the interpretation of imaging modalities. As the use of multimodal chatbots becomes increasingly widespread, it is imperative to stress their appropriate integration within medical contexts.

Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions

Evaluating the Artificial Intelligence Performance Growth in Ophthalmic Knowledge

Accuracy of an Artificial Intelligence Chatbot’s Interpretation of Clinical Ophthalmic Images

Evaluating the Accuracy of ChatGPT and Google BARD in Fielding Oculoplastic Patient Queries: A Comparative Study on Artificial versus Human Intelligence

Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination

The Accuracy of Artificial Intelligence ChatGPT in Oncology Examination Questions

Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4

Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE)

Can ChatGPT help patients answer their otolaryngology questions?

Eyes on AI: ChatGPT's Transformative Potential Impact on Ophthalmology

Evaluating the potential of ChatGPT-4 in ophthalmology: The good, the bad and the ugly

Assessing the Accuracy, Completeness, and Reliability of Artificial Intelligence-Generated Responses in Dentistry: A Pilot Study Evaluating the ChatGPT Model

Assessing the Efficacy of an AI-Powered Chatbot (ChatGPT) in Providing Information on Orthopedic Surgeries: A Comparative Study With Expert Opinion

Performance of ChatGPT on Solving Orthopedic Board-Style Questions: A Comparative Analysis of ChatGPT 3.5 and ChatGPT 4

Is ChatGPT 3.5 smarter than Otolaryngology trainees? A comparison study of board style exam questions

Performance of ChatGPT in Diagnosis of Corneal Eye Diseases

Assessing the medical reasoning skills of GPT-4 in complex ophthalmology cases

Assessing ChatGPT's Responses to Otolaryngology Patient Questions

Does ChatGPT Answer Otolaryngology Questions Accurately?

ChatGPT Earns American Board Certification in Hand Surgery

ChatGPT and retinal disease: a cross-sectional study on AI comprehension of clinical guidelines