Predicting polycystic ovary syndrome (PCOS) with machine learning algorithms from electronic health records
Zahra Zad,Victoria S. Jiang,Amber T. Wolf,Taiyao Wang,Jay Jojo Cheng,Ioannis Ch. Paschalidis,Shruthi Mahalingaiah,Zad,Z.,Jiang,V. S.,Wolf,A. T.,Wang,T.,Cheng,J. J.,Paschalidis,I. C.,Mahalingaiah,S.
DOI: https://doi.org/10.1101/2023.07.27.23293255
2023-08-04
MedRxiv
Abstract:Context. Predictive models have been used to aid early diagnosis of PCOS, though existing models are limited to fertility clinic populations. Objective. Build a predictive model based on an outpatient population at risk for PCOS to facilitate earlier diagnosis and risk prediction. Design. Retrospective cohort study from a SafetyNet hospital's electronic medical records (EMR) from 2003-2016. Setting. Hospital outpatient clinics Patients or Other Participants. 30,601 women aged 18-45 years without concurrent endocrinopathy who had any visit to Boston Medical Center for primary care, obstetrics/gynecology, endocrinology, family medicine, or general internal medicine. Intervention(s). None Main Outcome Measure(s). Four prediction outcomes based on Rotterdam criteria using ICD-9 codes for PCOS, irregular menstruation, hyperandrogenism, and PCOM on ultrasound. Results. We developed predictive models using four machine learning methods: logistic regression, supported vector machine, gradient boosted trees, and random forests. Hormone values (FSH, LH, estradiol, and SHBG) were combined to create a multilayer perceptron score using a neural network classifier. Prediction of PCOS prior to clinical diagnosis in an out-of-sample test set of patients achieved AUC of 85%, 81%, 80%, 82%, respectively in Models I, II, III and IV. Significant positive predictors of PCOS diagnosis across models included hormone levels and obesity; negative predictors included gravidity and positive bHCG. Conclusions. Among an at-risk population, machine learning algorithms were used to predict PCOS. This approach may guide early detection of PCOS within EMR-interfaced populations to facilitate counseling and interventions that may reduce long-term health consequences, however, additional studies including an entire health system patient population are necessary for model validation.