A 3-Gene Random Forest Model to Diagnose Non-obstructive Azoospermia Based on Transcription Factor-Related Henes

Ranran Zhou,Jingjing Liang,Qi Chen,Hu Tian,Cheng Yang,Cundong Liu
DOI: https://doi.org/10.1007/s43032-022-01008-8
2022-06-18
Reproductive Sciences
Abstract:Non-obstructive azoospermia (NOA) is one of the most severe forms of male infertility, but its diagnosis biomarkers with high sensitivity and specificity are largely unknown. Transcription factors (TFs) play essential roles in many pathological processes in different diseases. Herein, we aimed to identify the TFs showing high diagnosis ability for NOA through machine learning algorithms. The transcriptome data of the testicular tissue from 11 control and 47 NOA subjects were set as the training dataset; meanwhile, 1665 TFs were retrieved from the HumanTFDB. Through the feature extraction methods, including genomic difference analysis, Lasso, Boruta, SVM-RFE, and logistic regression, ETV2, TBX2, and ZNF689 were ultimately screened and then were included in the random forest (RF) diagnosis model. The RF model displayed high predictive power in the training ( F -measure = 1) and two external validation ( n = 31, F -measure = 0.902; n = 20, F -measure = 0.941) cohorts. The seminal plasma and testicular biopsy samples of 20 control and 20 NOA patients were collected from the local hospital, and the expression levels of ETV2, TBX2, and ZNF689 were measured via RT-qPCR and immunohistochemistry. The RF model could also distinguish the NOA samples in the local cohort ( F -measure = 0.741). Single-cell RNA sequencing analysis, which was based on the 432 testicular cell samples from an NOA patient, showed that ETV2, TBX2, and ZNF689 were all significantly associated with spermatogenesis. In all, a 3-TF random forest diagnosis model was successfully established, providing novel insights into the latent mechanisms of NOA.
obstetrics & gynecology,reproductive biology
What problem does this paper attempt to address?