External Validation of a Retinopathy of Prematurity Screening Model Using Artificial Intelligence in 3 Low- and Middle-Income Populations
Aaron S Coyner,Minn A Oh,Parag K Shah,Praveer Singh,Susan Ostmo,Nita G Valikodath,Emily Cole,Tala Al-Khaled,Sanyam Bajimaya,Sagun K C,Tsengelmaa Chuluunbat,Bayalag Munkhuu,Prema Subramanian,Narendran Venkatapathy,Karyn E Jonas,Joelle A Hallak,R V Paul Chan,Michael F Chiang,Jayashree Kalpathy-Cramer,J Peter Campbell
DOI: https://doi.org/10.1001/jamaophthalmol.2022.2135
2022-08-01
Abstract:Importance: Retinopathy of prematurity (ROP) is a leading cause of preventable blindness that disproportionately affects children born in low- and middle-income countries (LMICs). In-person and telemedical screening examinations can reduce this risk but are challenging to implement in LMICs owing to the multitude of at-risk infants and lack of trained ophthalmologists. Objective: To implement an ROP risk model using retinal images from a single baseline examination to identify infants who will develop treatment-requiring (TR)-ROP in LMIC telemedicine programs. Design, setting, and participants: In this diagnostic study conducted from February 1, 2019, to June 30, 2021, retinal fundus images were collected from infants as part of an Indian ROP telemedicine screening program. An artificial intelligence (AI)-derived vascular severity score (VSS) was obtained from images from the first examination after 30 weeks' postmenstrual age. Using 5-fold cross-validation, logistic regression models were trained on 2 variables (gestational age and VSS) for prediction of TR-ROP. The model was externally validated on test data sets from India, Nepal, and Mongolia. Data were analyzed from October 20, 2021, to April 20, 2022. Main outcomes and measures: Primary outcome measures included sensitivity, specificity, positive predictive value, and negative predictive value for predictions of future occurrences of TR-ROP; the number of weeks before clinical diagnosis when a prediction was made; and the potential reduction in number of examinations required. Results: A total of 3760 infants (median [IQR] postmenstrual age, 37 [5] weeks; 1950 male infants [51.9%]) were included in the study. The diagnostic model had a sensitivity and specificity, respectively, for each of the data sets as follows: India, 100.0% (95% CI, 87.2%-100.0%) and 63.3% (95% CI, 59.7%-66.8%); Nepal, 100.0% (95% CI, 54.1%-100.0%) and 77.8% (95% CI, 72.9%-82.2%); and Mongolia, 100.0% (95% CI, 93.3%-100.0%) and 45.8% (95% CI, 39.7%-52.1%). With the AI model, infants with TR-ROP were identified a median (IQR) of 2.0 (0-11) weeks before TR-ROP diagnosis in India, 0.5 (0-2.0) weeks before TR-ROP diagnosis in Nepal, and 0 (0-5.0) weeks before TR-ROP diagnosis in Mongolia. If low-risk infants were never screened again, the population could be effectively screened with 45.0% (India, 664/1476), 38.4% (Nepal, 151/393), and 51.3% (Mongolia, 266/519) fewer examinations required. Conclusions and relevance: Results of this diagnostic study suggest that there were 2 advantages to implementation of this risk model: (1) the number of examinations for low-risk infants could be reduced without missing cases of TR-ROP, and (2) high-risk infants could be identified and closely monitored before development of TR-ROP.