Uncovering Language Disparity of ChatGPT in Healthcare: Non-English Clinical Environment for Retinal Vascular Disease Classification (Preprint)

Xiaocong Liu,Jiageng Wu,Anwen Shao,Wei Shen,Panpan Ye,Yao Wang,Juan Ye,Kai Jin,Jie Yang
DOI: https://doi.org/10.2196/preprints.51926
2023-01-01
Abstract:Benefiting from the exceptional ability of text understanding and rich knowledge, large language models (LLMs) like ChatGPT, have shown great potential in English clinical environments. However, the performance of ChatGPT in non-English clinical settings, as well as its reasoning, have not been explored in-depth. To evaluate ChatGPT’s diagnostic performance and inference abilities for retinal vascular diseases in a non-English clinical environment. In this cross-sectional study, we collected 1226 fundus fluorescein angiography reports and corresponding diagnosis written in Chinese, and tested ChatGPT with four prompting strategies (direct diagnosis or diagnosis with explanation and in Chinese or English). ChatGPT using English prompt for direct diagnosis achieved the best performance, with F1-score of 80.05%, which was inferior to ophthalmologists (89.35%) but close to ophthalmologist interns (82.69%). Although ChatGPT can derive reasoning process with a low error rate, mistakes such as misinformation (1.96%), and hallucination (0.59%) still exist. ChatGPT can serve as a helpful medical assistant to provide diagnosis under non-English clinical environments, but there are still performance gaps, language disparity, and errors compared to professionals, which demonstrates the potential limitations and the desiration to continually explore more robust LLMs in ophthalmology practice. ClinicalTrials.gov NCT04718532
What problem does this paper attempt to address?