Establishment and Analysis of an Artificial Neural Network Model for Early Detection of Polycystic Ovary Syndrome Using Machine Learning Techniques
Yumi Wu,QiWei Xiao,ShouDong Wang,Huanfang Xu,YiGong Fang
DOI: https://doi.org/10.2147/jir.s438838
IF: 4.5
2023-11-29
Journal of Inflammation Research
Abstract:Yumi Wu, 1 QiWei Xiao, 1 ShouDong Wang, 2 Huanfang Xu, 1, 3 YiGong Fang 1, 3 1 Institute of Acupuncture and Moxibustion of China Academy of Chinese Medical Sciences, Beijing, People's Republic of China; 2 The Out-Patient Department of TCM of China Academy of Chinese Medical Sciences, Beijing, People's Republic of China; 3 Acupuncture and Moxibustion Hospital of China Academy of Chinese Medical Sciences, Beijing, People's Republic of China Correspondence: YiGong Fang; Huanfang Xu, Institute of Acupuncture and Moxibustion of China Academy of Chinese Medical Sciences, 16 Dongzhimennei South St, Dongcheng, Beijing, 100700, People's Republic of China, Tel +86 13520175177, Fax +86 010-64089219, Email ; Background: To identify novel gene combinations and to develop an early diagnostic model for Polycystic Ovary Syndrome (PCOS) through the integration of artificial neural networks (ANN) and random forest (RF) methods. Methods: We retrieved and processed gene expression datasets for PCOS from the Gene Expression Omnibus (GEO) database. Differential expression analysis of genes (DEGs) within the training set was performed using the "limma" R package. Enrichment analyses on DEGs using gene ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG), and immune cell infiltration. The identification of critical genes from DEGs was then performed using random forests, followed by the developing of new diagnostic models for PCOS using artificial neural networks. Results: We identified 130 up-regulated genes and 132 down-regulated genes in PCOS compared to normal samples. Gene Ontology analysis revealed significant enrichment in myofibrils and highlighted crucial biological functions related to myofilament sliding, myofibril, and actin-binding. Compared with normal tissues, the types of immune cells expressed in PCOS samples are different. A random forest algorithm identified 10 significant genes proposed as potential PCOS-specific biomarkers. Using these genes, an artificial neural network diagnostic model accurately distinguished PCOS from normal samples. The diagnostic model underwent validation using the independent validation set, and the resulting area under the receiver operating characteristic curve (AUC) values was consistent with the anticipated outcomes. Conclusion: Utilizing unique gene combinations, this research created a diagnostic model by merging random forest techniques with artificial neural networks. The AUC indicated a notably superior performance of the diagnostic model. Keywords: polycystic ovary syndrome, machine learning techniques, artificial neural network model, early diagnostic model, artificial neural networks, random forest The Polycystic Ovary Syndrome (PCOS) is an endocrine disorder characterized by heterogeneity and closely linked to various symptoms. 1 The National Institutes of Health (NIH), 2 the European Society for Human Reproduction and Embryology (ESHRE) and the American Society for Reproductive Medicine (ASRM) with their consensus, 3,4 and the Androgen Excess Society (AES) with its reference criteria 5 are the three main bodies proposing diagnostic criteria for PCOS. However, despite the proposals for these standards, a consensus has yet to be reached within the field. 6 The complex genetic architecture forms the basis for the multifactorial etiology of PCOS. 7 Moreover, previous studies have found that race is closely associated with PCOS phenotype due to different genetic metabolic disorders and environmental tendencies. Therefore, the aim of the study is to investigate unique and essential gene combinations while developing an early diagnostic model for PCOS. The study of disease mechanisms has significantly benefited from the advancement and increased precision of RNA sequencing technologies and the availability of microarray technology. 8 Identifying the most relevant variables for classification is the primary challenge when developing a categorization framework based on gene expression profiles. We apply various machine learning algorithms, including RF 9,10 and ANN, 11 to address this issue. Unlike standard statistical methods, machine learning involves extracting and analyzing information from case reports. Therefore, RF and ANN jointly developed a new PCOS diagnostic model by hypothesizing and exploring from the training set, and then validating in the validation set. 8 This study collected three datasets from GEO database (GSE6798, GSE84958, GSE43264). GSE6798 and GSE84958 sets were designated as the training set and GSE43264 was the validation set. using the "limma" R package<su -Abstract Truncated-
immunology