A comparative study of GPT-4o and human ophthalmologists in glaucoma diagnosis

Junxiu Zhang,Yao Ma,Rong Zhang,Yanhua Chen,Mengyao Xu,Su Rina,Ke Ma
DOI: https://doi.org/10.1038/s41598-024-80917-x
IF: 4.6
2024-12-06
Scientific Reports
Abstract:Artificial intelligence (AI), particularly large language models like GPT-4o, holds promise for enhancing diagnostic accuracy in healthcare. This study evaluates the diagnostic performance of GPT-4o compared to human ophthalmologists in glaucoma cases. A prospective, observational study was conducted at a tertiary care ophthalmology center. Twenty-six glaucoma cases, including both primary and secondary types, were selected from publicly available databases and institutional records. The cases were analyzed by GPT-4o and three ophthalmologists with varying levels of experience. The accuracy and completeness of primary and differential diagnoses were assessed using 10-point and 6-point Likert scales, respectively. Statistical analyses were performed using nonparametric methods, including the Kruskal–Wallis and Mann–Whitney U tests. GPT-4o was significantly less accurate in primary diagnosis compared to human ophthalmologists. Specifically, GPT-4o achieved a mean score of 5.500 (p < 0.001) compared to Doctor C, who had the highest score of 8.038 (p < 0.001). Completeness scores for GPT-4o 3.077 (p < 0.001) were also lower than Doctor B, who had the lowest score of 3.615 (p < 0.001) among human ophthalmologists. However, for differential diagnosis, GPT-4o (7.577) showed comparable accuracy to Doctor A (7.615) and Doctor C (7.673) (p < 0.0001) while achieving the highest completeness score (4.096), outperforming Doctor C (3.846), Doctor A (2.923), and Doctor B (2.808) (p < 0.0001). AI, including GPT-4o, is currently not an acceptable standalone method for diagnosing glaucoma due to its lower accuracy compared to human clinicians. These findings suggest that GPT-4o could serve as a valuable adjunct in clinical practice, particularly in complex cases, but should not replace human expertise, especially for initial diagnoses. Future improvements in AI models could enhance their utility in ophthalmology.
multidisciplinary sciences
What problem does this paper attempt to address?