Abstract:Importance Ophthalmology is reliant on effective interpretation of multimodal imaging to ensure diagnostic accuracy. The new ability of ChatGPT-4 (OpenAI) to interpret ophthalmic images has not yet been explored. Objective To evaluate the performance of the novel release of an artificial intelligence chatbot that is capable of processing imaging data. Design, Setting, and Participants This cross-sectional study used a publicly available dataset of ophthalmic cases from OCTCases, a medical education platform based out of the Department of Ophthalmology and Vision Sciences at the University of Toronto, with accompanying clinical multimodal imaging and multiple-choice questions. Across 137 available cases, 136 contained multiple-choice questions (99%). Exposures The chatbot answered questions requiring multimodal input from October 16 to October 23, 2023. Main Outcomes and Measures The primary outcome was the accuracy of the chatbot in answering multiple-choice questions pertaining to image recognition in ophthalmic cases, measured as the proportion of correct responses. χ 2 Tests were conducted to compare the proportion of correct responses across different ophthalmic subspecialties. Results A total of 429 multiple-choice questions from 136 ophthalmic cases and 448 images were included in the analysis. The chatbot answered 299 of multiple-choice questions correctly across all cases (70%). The chatbot’s performance was better on retina questions than neuro-ophthalmology questions (77% vs 58%; difference = 18%; 95% CI, 7.5%-29.4%; χ 2 1 = 11.4; P &lt; .001). The chatbot achieved a better performance on nonimage–based questions compared with image-based questions (82% vs 65%; difference = 17%; 95% CI, 7.8%-25.1%; χ 2 1 = 12.2; P &lt; .001).The chatbot performed best on questions in the retina category (77% correct) and poorest in the neuro-ophthalmology category (58% correct). The chatbot demonstrated intermediate performance on questions from the ocular oncology (72% correct), pediatric ophthalmology (68% correct), uveitis (67% correct), and glaucoma (61% correct) categories. Conclusions and Relevance In this study, the recent version of the chatbot accurately responded to approximately two-thirds of multiple-choice questions pertaining to ophthalmic cases based on imaging interpretation. The multimodal chatbot performed better on questions that did not rely on the interpretation of imaging modalities. As the use of multimodal chatbots becomes increasingly widespread, it is imperative to stress their appropriate integration within medical contexts.

The ability of artificial intelligence chatbots ChatGPT and Google Bard to accurately convey pre-operative information for patients undergoing ophthalmological surgeries

Evaluating the Accuracy of ChatGPT and Google BARD in Fielding Oculoplastic Patient Queries: A Comparative Study on Artificial versus Human Intelligence

Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma

Artificial intelligence chatbots as sources of patient education material for cataract surgery: ChatGPT-4 versus Google Bard

Exploring the Role of Artificial Intelligence Chatbots in Preoperative Counseling for Head and Neck Cancer Surgery

Comparative Analysis of Accuracy, Readability, Sentiment, and Actionability: Artificial Intelligence Chatbots (ChatGPT and Google Gemini) versus Traditional Patient Information Leaflets for Local Anesthesia in Eye Surgery

Assessing the Efficacy of an AI-Powered Chatbot (ChatGPT) in Providing Information on Orthopedic Surgeries: A Comparative Study With Expert Opinion

Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients

Usefulness and Accuracy of Artificial Intelligence Chatbot Responses to Patient Questions for Neurosurgical Procedures

Artificial Intelligence Chatbots' Understanding of the Risks and Benefits of Computed Tomography and Magnetic Resonance Imaging Scenarios

Comparative Performance of Current Patient-Accessible Artificial Intelligence Large Language Models in the Preoperative Education of Patients in Facial Aesthetic Surgery

Utility and Comparative Performance of Current Artificial Intelligence Large Language Models as Postoperative Medical Support Chatbots in Aesthetic Surgery

Accuracy of an Artificial Intelligence Chatbot’s Interpretation of Clinical Ophthalmic Images

Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery

Reliability and accuracy of artificial intelligence ChatGPT in providing information on ophthalmic diseases and management to patients

A Comparative Analysis of ChatGPT and Google’s AI’s “Bard” in Medicine

Evaluation of the Current Status of Artificial Intelligence for Endourology Patient Education: A Blind Comparison of ChatGPT and Google Bard against Traditional Information Resources

Blepharoptosis Consultation with Artificial Intelligence: Aesthetic Surgery Advice and Counseling from Chat Generative Pre-Trained Transformer (ChatGPT)

Reliability of artificial intelligence chatbot responses to frequently asked questions in breast surgical oncology

Appropriateness and Readability of ChatGPT-4-Generated Responses for Surgical Treatment of Retinal Diseases

Comparison of artificial intelligence large language model chatbots in answering frequently asked questions in anaesthesia