Automated Explainable Multidimensional Deep Learning Platform of Retinal Images for Retinopathy of Prematurity Screening
Ji Wang,Jie Ji,Mingzhi Zhang,Jian-Wei Lin,Guihua Zhang,Weifen Gong,Ling-Ping Cen,Yamei Lu,Xuelin Huang,Dingguo Huang,Taiping Li,Tsz Kin Ng,Chi Pui Pang
DOI: https://doi.org/10.1001/jamanetworkopen.2021.8758
2021-05-03
Abstract:Importance: A retinopathy of prematurity (ROP) diagnosis currently relies on indirect ophthalmoscopy assessed by experienced ophthalmologists. A deep learning algorithm based on retinal images may facilitate early detection and timely treatment of ROP to improve visual outcomes. Objective: To develop a retinal image-based, multidimensional, automated, deep learning platform for ROP screening and validate its performance accuracy. Design, setting, and participants: A total of 14 108 eyes of 8652 preterm infants who received ROP screening from 4 centers from November 4, 2010, to November 14, 2019, were included, and a total of 52 249 retinal images were randomly split into training, validation, and test sets. Four main dimensional independent classifiers were developed, including image quality, any stage of ROP, intraocular hemorrhage, and preplus/plus disease. Referral-warranted ROP was automatically generated by integrating the results of 4 classifiers at the image, eye, and patient levels. DeepSHAP, a method based on DeepLIFT and Shapley values (solution concepts in cooperative game theory), was adopted as the heat map technology to explain the predictions. The performance of the platform was further validated as compared with that of the experienced ROP experts. Data were analyzed from February 12, 2020, to June 24, 2020. Exposure: A deep learning algorithm. Main outcomes and measures: The performance of each classifier included true negative, false positive, false negative, true positive, F1 score, sensitivity, specificity, receiver operating characteristic, area under curve (AUC), and Cohen unweighted κ. Results: A total of 14 108 eyes of 8652 preterm infants (mean [SD] gestational age, 32.9 [3.1] weeks; 4818 boys [60.4%] of 7973 with known sex) received ROP screening. The performance of all classifiers achieved an F1 score of 0.718 to 0.981, a sensitivity of 0.918 to 0.982, a specificity of 0.949 to 0.992, and an AUC of 0.983 to 0.998, whereas that of the referral system achieved an F1 score of 0.898 to 0.956, a sensitivity of 0.981 to 0.986, a specificity of 0.939 to 0.974, and an AUC of 0.9901 to 0.9956. Fine-grained and class-discriminative heat maps were generated by DeepSHAP in real time. The platform achieved a Cohen unweighted κ of 0.86 to 0.98 compared with a Cohen κ of 0.93 to 0.98 by the ROP experts. Conclusions and relevance: In this diagnostic study, an automated ROP screening platform was able to identify and classify multidimensional pathologic lesions in the retinal images. This platform may be able to assist routine ROP screening in general and children hospitals.