Cell-free RNA and fully convolutional dense network-based early preeclampsia prediction.
Zhuo Zhao,Bing Li,Xia Xiao,Jinjun Liu,Wang Zheng
DOI: https://doi.org/10.1002/ctm2.1371
IF: 8.554
2023-01-01
Clinical and Translational Medicine
Abstract:We propose a fully convolutional dense network (FCDN) model1, 2 to predict preeclampsia (PE) with circulating cell-free RNA (cfRNA).3-5 The Individual Risk Score (IRS) output of the proposed FCDN model contributes to the literature on consistently monitoring the risk of PE, evaluating the effect of prophylactic treatments, and providing accurate as well as rapid screening and diagnosis of PE in populous developing countries with a high incidence of PE, such as China. PE is a pregnancy-specific hypertensive disorder and leads to 10.2 deaths per 100 000 pregnancies,6-8 reaming the second death cause of pregnant women in China. Diagnoses of PE are still regularly missed or delayed and predicting PE in early gestation remains challenging. The trained network is designed to predict PE risk in terms of IRS according to variations in personal cfRNA profiling in early pregnancy (Figure 1A). For the first step, standardized and cleaned cfRNA sequencing data from normal pregnancy (NP) and PE were downloaded from GSE1929025 in Gene Expression Omnibus, which were collected ≤12 gestational weeks (gws) or at 13–20 gws. A total of 7160 detected cfRNAs were filtered to select cfRNAs with significant changes that could be used as indicators of PE risk. As illustrated in Figure 1B, we used multiple tests to optimize the parameters of the algorithm, a total of 29 cfRNAs were chosen as PE indicators for samples sequenced at ≤12 gws, and 25 cfRNAs were selected as samples sequenced at 13–20 gws (Table S1). In fact, only one clinical diagnosis conclusion was available for any enrolled woman: NP or PE. Therefore, we define that women in the PE group have the maximum IRS = 1 and NP have minimum IRS = 0. Based on the sequencing cfRNA from enrolled women, we can calculate their IRS using Equations (2) and (3). Next, we calculated the IRS at that sampling time. Calculated IRS is regarded as the ground truth for the enrolled women. At a sampling time ≤12 gws, the average of calculated IRS was 0.27 and 0.47 in NP and PE group, respectively (Figure 1C). At a sampling time of 13–20 gws, the average calculated IRS of NP and PE was 0.39 and 0.57, respectively (Figure 1D). The average calculated IRS for the NP group differed notably from that of the PE group (Supplementary Figure 1). The results suggest that the filtered cfRNA indicators work well to distinguish NP from PE and support the application of the proposed model in clinical practice. Next, we used an FCDN model to perform data regression (Figure 1E). The procedure of FCDN on PE prediction is shown in Figure 1F. In the current study, different datasets were used for model training and validation. The dataset in GSE192902 was divided into Discover Cohort, Validation 1 Cohort and Validation 2 Cohort. For model training, Validation 2 Cohort (87 sets of real-world cfRNA profiles) and 7913 computer-generated cfRNA profiles were employed. For model validation, 1000 computer-generated cfRNA profiles were employed. For the final model validation (application), we used Discover Cohort, and Validation 1 Cohort, which include 215 sets of real-world cfRNA profiles. A more detailed method for FCND construction, training and validation is shown in the Supporting Information. Through FCDN model training and validation (Figure 2A,B), the loss value (mean absolute error [MAE]) of the probability prediction decreased to 0.027, and an optimized model could then be obtained. To validate the prediction accuracy of the model, we used cfRNA expression from the real world5 as the input set x_test for the FCDN model to obtain the FCDN-based IRS. Furthermore, the FCDN-based IRS was compared with the ground truth (calculated IRS) calculated from real-world cfRNA profiling. At a sampling time of ≤12 gws, the tendency and amplitude of the FCDN-based IRS (prediction results) resembled the ground truth, suggesting the fitting ability of our FCDN model (Figure 2C). The MAE between the prediction result and the ground truth was only 0.032. PE and NP can be separated using averaged FCDN-based IRS. We also calculated the FCDN-based IRS for samples enrolled 13–20 gws (Figure 2D). Over the whole scale, the prediction results approximate the ground truth. The MAE between the prediction result and ground truth was only 0.041, indicating that the FCDN model predicted the ground truth well. The error amplitude of the FCDN-based IRS in processing cfRNA samples ≤12 gws is shown in Figure 2E. The maximum value of the absolute error, the peak-to-valley (PV) value of the error, and the mean value of the absolute error were 0.12, 0.16 and 0.046, respectively. For samples within 13−20 gws (Figure 2F), the maximum value of the absolute error reached 0.16, and the PV value of the error reached 0.27. The mean absolute error was 0.008. In short, the prediction error for IRS was well-controlled within a small amplitude, and the FCDN model was able to fit the data well. We also considered processing efficiency when dealing with numerous datasets collected from population screening. Therefore, the prediction time efficiency was also used as another benchmark to evaluate the method. In this test, the cfRNA profiling samples were fed into the trained FCDN model, and the time required to output an IRS value was recorded. As shown in Figure 2G,H, the results of 15 consecutive experiments showed that the average time required to output an IRS reached 10−5 s per sample. In summary, we employed novel biomarker cfRNAs and an FCDN model to output an IRS to predict PE. The prediction accuracy and computational time of the proposed model reached 0.95 and 10−5 s per sample, respectively. The reported method provides a reliable tool for rapid and minimally invasive monitoring of individual PE risk and sheds new light on maternal and neonatal healthcare (Figure 3). We thank Mira N. Moufarrej, Sevahn K. Vorperian, Ronald J. Wong, Ana A. Campos, Cecele C. Quaintance, Rene V. Sit, Michelle Tan, Angela M. Detweiler, Honey Mekonen, Norma F. Neff, Courtney Baruch-Gravett, James A. Litch, Maurice L. Druzin, Virginia D. Winn, Gary M. Shaw, David K. Stevenson, and Stephen R. Quake for their illuminating research and their great effort in samples collecting and cfRNA sequencing. We also thanked Mr Tao Liu (Guangzhou Gene Denovo Biotechnology Co., Ltd.) for his kind help in bioinformatics analysis. Besides, we express our sincere thanks to all the enrolled participants again for their contribution to medical research. The authors declare no conflict of interest. This work was supported by the National Natural Science Foundation of China (82071670 and 81771616) to Jinjun Liu, Natural Science Foundation of Shaanxi Province (2023-JC-QN-0954) to Zheng Wang, Funding of Clinical Research Center of Shaanxi Province for Dental and Maxillofacial Diseases (2022YHJB08) to Xia Xiao and Jilin Science and Technology Development Project No. 20210502028ZP. Codes and scripts developed for this study are available upon reasonable request. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.