Abstract:Abstract STUDY QUESTION What is the accuracy and agreement of embryologists when assessing the implantation probability of blastocysts using time-lapse imaging (TLI), and can it be improved with a data-driven algorithm? SUMMARY ANSWER The overall interobserver agreement of a large panel of embryologists was moderate and prediction accuracy was modest, while the purpose-built artificial intelligence model generally resulted in higher performance metrics. WHAT IS KNOWN ALREADY Previous studies have demonstrated significant interobserver variability amongst embryologists when assessing embryo quality. However, data concerning embryologists’ ability to predict implantation probability using TLI is still lacking. Emerging technologies based on data-driven tools have shown great promise for improving embryo selection and predicting clinical outcomes. STUDY DESIGN, SIZE, DURATION TLI video files of 136 embryos with known implantation data were retrospectively collected from two clinical sites between 2018 and 2019 for the performance assessment of 36 embryologists and comparison with a deep neural network (DNN). PARTICIPANTS/MATERIALS, SETTING, METHODS We recruited 39 embryologists from 13 different countries. All participants were blinded to clinical outcomes. A total of 136 TLI videos of embryos that reached the blastocyst stage were used for this experiment. Each embryo’s likelihood of successfully implanting was assessed by 36 embryologists, providing implantation probability grades (IPGs) from 1 to 5, where 1 indicates a very low likelihood of implantation and 5 indicates a very high likelihood. Subsequently, three embryologists with over 5 years of experience provided Gardner scores. All 136 blastocysts were categorized into three quality groups based on their Gardner scores. Embryologist predictions were then converted into predictions of implantation (IPG ≥ 3) and no implantation (IPG ≤ 2). Embryologists’ performance and agreement were assessed using Fleiss kappa coefficient. A 10-fold cross-validation DNN was developed to provide IPGs for TLI video files. The model’s performance was compared to that of the embryologists. MAIN RESULTS AND THE ROLE OF CHANCE Logistic regression was employed for the following confounding variables: country of residence, academic level, embryo scoring system, log years of experience and experience using TLI. None were found to have a statistically significant impact on embryologist performance at α = 0.05. The average implantation prediction accuracy for the embryologists was 51.9% for all embryos (N = 136). The average accuracy of the embryologists when assessing top quality and poor quality embryos (according to the Gardner score categorizations) was 57.5% and 57.4%, respectively, and 44.6% for fair quality embryos. Overall interobserver agreement was moderate (κ = 0.56, N = 136). The best agreement was achieved in the poor + top quality group (κ = 0.65, N = 77), while the agreement in the fair quality group was lower (κ = 0.25, N = 59). The DNN showed an overall accuracy rate of 62.5%, with accuracies of 62.2%, 61% and 65.6% for the poor, fair and top quality groups, respectively. The AUC for the DNN was higher than that of the embryologists overall (0.70 DNN vs 0.61 embryologists) as well as in all of the Gardner groups (DNN vs embryologists—Poor: 0.69 vs 0.62; Fair: 0.67 vs 0.53; Top: 0.77 vs 0.54). LIMITATIONS, REASONS FOR CAUTION Blastocyst assessment was performed using video files acquired from time-lapse incubators, where each video contained data from a single focal plane. Clinical data regarding the underlying cause of infertility and endometrial thickness before the transfer was not available, yet may explain implantation failure and lower accuracy of IPGs. Implantation was defined as the presence of a gestational sac, whereas the detection of fetal heartbeat is a more robust marker of embryo viability. The raw data were anonymized to the extent that it was not possible to quantify the number of unique patients and cycles included in the study, potentially masking the effect of bias from a limited patient pool. Furthermore, the lack of demographic data makes it difficult to draw conclusions on how representative the dataset was of the wider population. Finally, embryologists were required to assess the implantation potential, not embryo quality. Although this is not the traditional approach to embryo evaluation, morphology/morphokinetics as a means of assessing embryo quality is believed to be strongly correlated with viability and, for some methods, implantation potential. WIDER IMPLICATIONS OF THE FINDINGS Embryo selection is a key element in IVF success and continues to be a challenge. Improving the predictive ability could assist in optimizing implantation success rates and other clinical outcomes and could minimize the financial and emotional burden on the patient. This study demonstrates moderate agreement rates between embryologists, likely due to the subjective nature of embryo assessment. In particular, we found that average embryologist accuracy and agreement were significantly lower for fair quality embryos when compared with that for top and poor quality embryos. Using data-driven algorithms as an assistive tool may help IVF professionals increase success rates and promote much needed standardization in the IVF clinic. Our results indicate a need for further research regarding technological advancement in this field. STUDY FUNDING/COMPETING INTEREST(S) Embryonics Ltd is an Israel-based company. Funding for the study was partially provided by the Israeli Innovation Authority, grant #74556. TRIAL REGISTRATION NUMBER N/A.

P–247 Application of deep learning for automated measurement of key morphological features of human zygotes for IVF

Automated Measurements of Key Morphological Features of Human Embryos for IVF

P–241 Construction of a Machine Learning algorithm based on early morphokinetics for human blastocyst development prediction: a retrospective analysis of 575 cleavage-stage embryos

Deep learning for embryo evaluation using time-lapse: a systematic review of diagnostic test accuracy

An artificial intelligence tool predicts blastocyst development from static images of fresh mature oocytes

Deep learning pipeline reveals key moments in human embryonic development predictive of live birth after in vitro fertilization

Using deep learning to predict the outcome of live birth from more than 10,000 embryo data

A hybrid artificial intelligence model leverages multi-centric clinical data to improve fetal heart rate pregnancy prediction across time-lapse systems

Deep learning-based embryo assessment of static images can reduce the time to live birth in in vitro fertilization

Deep learning pipeline reveals key moments in human embryonic development predictive of live birth in IVF

Deep learning mediated single time-point image-based prediction of embryo developmental outcome at the cleavage stage

Embryologist agreement when assessing blastocyst implantation probability: is data-driven prediction the solution to embryo assessment subjectivity?

Towards deep learning-powered IVF: A large public benchmark for morphokinetic parameter prediction

A clinical consensus-compliant deep learning approach to quantitatively evaluate human in vitro fertilization early embryonic development with optical microscope images

Artificial intelligence system for outcome evaluations of human in vitro fertilization-derived embryos

Noninvasive time-lapse 3D subcellular analysis of embryo development for machine learning-enabled prediction of blastocyst formation

Performance of a deep learning based neural network in the selection of human blastocysts for implantation

An artificial intelligence algorithm to select most viable embryos considering current process in IVF labs

Accurate Machine Learning Model for Human Embryo Morphokinetic Stage Detection

Development of a dynamic machine learning algorithm to predict clinical pregnancy and live birth rate with embryo morphokinetics

Generative artificial intelligence to produce high-fidelity blastocyst-stage embryo images