A noninvasive prenatal test pipeline with a well-generalized machine-learning approach for accurate fetal trisomy detection using low-depth short sequence data

Qiongrong Huang,Jianjiang Zhu,Jianbo Lu,Qiaojun Fang,Hong Qi,Bin Tu
DOI: https://doi.org/10.1016/j.eswa.2024.123759
IF: 8.5
2024-03-25
Expert Systems with Applications
Abstract:Noninvasive prenatal test (NIPT) reduces the associated risk of procedure-related miscarriage. However, due to accuracy, special fetuses, economic and policy gaps, NIPT still cannot replace traditional surgical methods. Developing a pipeline with low cost, low technical difficulty, stability and high accuracy is a major challenge for NIPT to be widely used. This study proposes a new pipeline for the detection of fetal trisomy which includes 3 steps: 1. 40 bp single-end sequencing, 2. P chrN calculations, and 3. logistic regression (LR) models. Part of the public dataset (100 out of 144 samples) was used to train models and select features in the machine learning pipeline. 314 samples from different sources were used for independent testing. We compare the performance of our method with the bioinformatics method widely used today. Our model shows high robustness to data from different sources. The final best model achieved an AUC of 99.85 % in predicting T21 using chr21 features which are the DNA fragment concentrations. The AUC is 99.50 %, and 97.70 % in predicting T18 and T13 with all features from 24 chromosomes. The PPV of T21, T18 and T13 was predicted to be 91.67 %, 93.33 % and 83.33 %, respectively, which was higher than that obtained by standard bioinformatics methods. The NPV to identify T21, T18, and T13 were 100 %, 99.33 %, and 98.70 %, respectively. Our approach does not need to calculate fetal fraction (FF) and can handle samples from a wide range of gestational ages (GA), twin pregnancies and fetal mosaicism. Our approach can achieve comparable accuracy with the current standard bioinformatics analysis in low-depth sequencing data. This convenient pipeline can be used independently of traditional bioinformatics methods, and its performance has been tested in real clinical practice. Our pipeline can be an important aid for the detection of fetal trisomy in clinical NIPT, which will help further popularize NIPT.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?