APPLICATIONS OF MULTIMODAL GENERATIVE ARTIFICIAL INTELLIGENCE IN A REAL-WORLD RETINA CLINIC SETTING

Seyyedehfatemeh Ghalibafan,David J Taylor Gonzalez,Louis Z Cai,Brandon Graham Chou,Sugi Panneerselvam,Spencer Conrad Barrett,Mak B Djulbegovic,Nicolas A Yannuzzi
DOI: https://doi.org/10.1097/IAE.0000000000004204
2024-10-01
Retina
Abstract:Purpose: This study evaluates a large language model, Generative Pre-trained Transformer 4 with vision, for diagnosing vitreoretinal diseases in real-world ophthalmology settings. Methods: A retrospective cross-sectional study at Bascom Palmer Eye Clinic, analyzing patient data from January 2010 to March 2023, assesses Generative Pre-trained Transformer 4 with vision's performance on retinal image analysis and International Classification of Diseases 10th revision coding across 2 patient groups: simpler cases (Group A) and complex cases (Group B) requiring more in-depth analysis. Diagnostic accuracy was assessed through open-ended questions and multiple-choice questions independently verified by three retina specialists. Results: In 256 eyes from 143 patients, Generative Pre-trained Transformer 4-V demonstrated a 13.7% accuracy for open-ended questions and 31.3% for multiple-choice questions, with International Classification of Diseases 10th revision code accuracies at 5.5% and 31.3%, respectively. Accurately diagnosed posterior vitreous detachment, nonexudative age-related macular degeneration, and retinal detachment. International Classification of Diseases 10th revision coding was most accurate for nonexudative age-related macular degeneration, central retinal vein occlusion, and macular hole in OEQs, and for posterior vitreous detachment, nonexudative age-related macular degeneration, and retinal detachment in multiple-choice questions. No significant difference in diagnostic or coding accuracy was found in Groups A and B. Conclusion: Generative Pre-trained Transformer 4 with vision has potential in clinical care and record keeping, particularly with standardized questions. Its effectiveness in open-ended scenarios is limited, indicating a significant limitation in providing complex medical advice.
What problem does this paper attempt to address?