Application of Comprehensive Artificial Intelligence Retinal Expert (CARE) System: a National Real-World Evidence Study.
Duoru Lin,Jianhao Xiong,Congxin Liu,Lanqin Zhao,Zhongwen Li,Shanshan Yu,Xiaohang Wu,Zongyuan Ge,Xinyue Hu,Bin Wang,Meng Fu,Xin Zhao,Xin Wang,Yi Zhu,Chuan Chen,Tao Li,Yonghao Li,Wenbin Wei,Mingwei Zhao,Jianqiao Li,Fan Xu,Lin Ding,Gang Tan,Yi Xiang,Yongcheng Hu,Ping Zhang,Yu Han,Ji-Peng Olivia Li,Lai Wei,Pengzhi Zhu,Yizhi Liu,Weirong Chen,Daniel S. W. Ting,Tien Y. Wong,Yuzhong Chen,Haotian Lin
DOI: https://doi.org/10.1016/s2589-7500(21)00086-8
2021-01-01
Abstract:Background Medical artificial intelligence (AI) has entered the clinical implementation phase, although real-world performance of deep-learning systems (DLSs) for screening fundus disease remains unsatisfactory. Our study aimed to train a clinically applicable DLS for fundus diseases using data derived from the real world, and externally test the model using fundus photographs collected prospectively from the settings in which the model would most likely be adopted. Methods In this national real-world evidence study, we trained a DLS, the Comprehensive AI Retinal Expert (CARE) system, to identify the 14 most common retinal abnormalities using 207228 colour fundus photographs derived from 16 clinical settings with different disease distributions. CARE was internally validated using 21867 photographs and externally tested using 18136 photographs prospectively collected from 35 real-world settings across China where CARE might be adopted, including eight tertiary hospitals, six community hospitals, and 21 physical examination centres. The performance of CARE was further compared with that of 16 ophthalmologists and tested using datasets with non-Chinese ethnicities and previously unused camera types. This study was registered with ClinicalTrials.gov, NCT04213430, and is currently closed. Findings The area under the receiver operating characteristic curve (AUC) in the internal validation set was 0.955 (SD 0.046). AUC values in the external test set were 0.965 (0.035) in tertiary hospitals, 0.983 (0.031) in community hospitals, and 0.953 (0.042) in physical examination centres. The performance of CARE was similar to that of ophthalmologists. Large variations in sensitivity were observed among the ophthalmologists in different regions and with varying experience. The system retained strong identification performance when tested using the non-Chinese dataset (AUC 0.960, 95% CI 0.957-0.964 in referable diabetic retinopathy). Interpretation Our DLS (CARE) showed satisfactory performance for screening multiple retinal abnormalities in real-world settings using prospectively collected fundus photographs, and so could allow the system to be implemented and adopted for clinical care. Copyright (C) 2021 The Author(s). Published by Elsevier ltd.