A hybrid artificial intelligence model leverages multi-centric clinical data to improve fetal heart rate pregnancy prediction across time-lapse systems
A Duval,D Nogueira,N Dissler,M Maskani Filali,F Delestro Matos,L Chansel-Debordeaux,M Ferrer-Buitrago,E Ferrer,V Antequera,M Ruiz-Jorro,A Papaxanthos,H Ouchchane,B Keppi,P-Y Prima,G Regnier-Vigouroux,L Trebesses,C Geoffroy-Siraudin,S Zaragoza,E Scalici,P Sanguinet,N Cassagnard,C Ozanon,A De La Fuente,E Gómez,M Gervoise Boyer,P Boyer,E Ricciarelli,X Pollet-Villard,A Boussommier-Calleja
DOI: https://doi.org/10.1093/humrep/dead023
IF: 6.1
2023-02-10
Human Reproduction
Abstract:Abstract STUDY QUESTION Can artificial intelligence (AI) algorithms developed to assist embryologists in evaluating embryo morphokinetics be enriched with multi-centric clinical data to better predict clinical pregnancy outcome? SUMMARY ANSWER Training algorithms on multi-centric clinical data significantly increased AUC compared to algorithms that only analyzed the time-lapse system (TLS) videos. WHAT IS KNOWN ALREADY Several AI-based algorithms have been developed to predict pregnancy, most of them based only on analysis of the time-lapse recording of embryo development. It remains unclear, however, whether considering numerous clinical features can improve the predictive performances of time-lapse based embryo evaluation. STUDY DESIGN, SIZE, DURATION A dataset of 9986 embryos (95.60% known clinical pregnancy outcome, 32.47% frozen transfers) from 5226 patients from 14 European fertility centers (in two countries) recorded with three different TLS was used to train and validate the algorithms. A total of 31 clinical factors were collected. A separate test set (447 videos) was used to compare performances between embryologists and the algorithm. PARTICIPANTS/MATERIALS, SETTING, METHODS Clinical pregnancy (defined as a pregnancy leading to a fetal heartbeat) outcome was first predicted using a 3D convolutional neural network that analyzed videos of the embryonic development up to 2 or 3 days of development (33% of the database) or up to 5 or 6 days of development (67% of the database). The output video score was then fed as input alongside clinical features to a gradient boosting algorithm that generated a second score corresponding to the hybrid model. AUC was computed across 7-fold of the validation dataset for both models. These predictions were compared to those of 13 senior embryologists made on the test dataset. MAIN RESULTS AND THE ROLE OF CHANCE The average AUC of the hybrid model across all 7-fold was significantly higher than that of the video model (0.727 versus 0.684, respectively, P = 0.015; Wilcoxon test). A SHapley Additive exPlanations (SHAP) analysis of the hybrid model showed that the six first most important features to predict pregnancy were morphokinetics of the embryo (video score), oocyte age, total gonadotrophin dose intake, number of embryos generated, number of oocytes retrieved, and endometrium thickness. The hybrid model was shown to be superior to embryologists with respect to different metrics, including the balanced accuracy (P ≤ 0.003; Wilcoxon test). The likelihood of pregnancy was linearly linked to the hybrid score, with increasing odds ratio (maximum P-value = 0.001), demonstrating the ranking capacity of the model. Training individual hybrid models did not improve predictive performance. A clinic hold-out experiment was conducted and resulted in AUCs ranging between 0.63 and 0.73. Performance of the hybrid model did not vary between TLS or between subgroups of embryos transferred at different days of embryonic development. The hybrid model did fare better for patients older than 35 years (P < 0.001; Mann–Whitney test), and for fresh transfers (P < 0.001; Mann–Whitney test). LIMITATIONS, REASONS FOR CAUTION Participant centers were located in two countries, thus limiting the generalization of our conclusion to wider subpopulations of patients. Not all clinical features were available for all embryos, thus limiting the performances of the hybrid model in some instances. WIDER IMPLICATIONS OF THE FINDINGS Our study suggests that considering clinical data improves pregnancy predictive performances and that there is no need to retrain algorithms at the clinic level unless they follow strikingly different practices. This study characterizes a versatile AI algorithm with similar performance on different time-lapse microscopes and on embryos transferred at different development stages. It can also help with patients of different ages and protocols used but with varying performances, presumably because the task of predicting fetal heartbeat becomes more or less hard depending on the clinical context. This AI model can be made widely available and can help embryologists in a wide range of clinical scenarios to standardize their practices. STUDY FUNDING/COMPETING INTEREST(S) Funding for the study was provided by ImVitro with grant funding received in part from BPIFrance (Bourse French Tech Emergence (DOS0106572/00), Paris Innovation Amorçage (DOS0132841/00), and Aide au Développement DeepTech (DOS0152872/00)). A.B.-C. is a co-owner of, and holds stocks in, ImVitro SAS. A.B.-C. and F.D.M. hold a patent for ‘Devices and processes for machine learning prediction of in vitro fertilization’ (EP20305914.2). A.D., N.D., M.M.F., and F.D.M. are or have been employees of ImVitro and have been granted stock options. X.P.-V. has been paid as a consultant to ImVitro and has been granted stocks options of ImVitro. L.C.-D. and C.G.-S. have undertaken paid consultancy for ImVitro SAS. The remaining authors have no conflicts to declare. TRIAL REGISTRATION NUMBER N/A.
obstetrics & gynecology,reproductive biology