Diagnostic Performance of Machine Learning-Derived Obstructive Sleep Apnea Prediction Tools in Large Clinical and Community-based Samples.
Steven Holfinger,M. Melanie Lyons,Brendan T. Keenan,Diego R. Mazzotti,Jesse W Mindel,Greg Maislin,Peter A. Cistulli,Kate Sutherland,Nigel McArdle,Bhajan Singh,Ning-Hung Chen,Thorarinn Gislason,Thomas Penzel,Fang Han,Qing Yun Li,Richard Schwab,Allan I. Pack,Ulysses J. Magalang
DOI: https://doi.org/10.1016/j.chest.2021.10.023
IF: 9.6
2021-01-01
Chest
Abstract:ABSTRACT Background Prediction tools without patient-reported symptoms could facilitate widespread identification of obstructive sleep apnea (OSA). Research Question What is the diagnostic performance of machine-learning derived OSA prediction tools using readily available data without patient responses to questionnaires, and how do they compare to the STOP-BANG tool in clinical and community-based samples? Study Design and Methods Logistic regression and machine learning techniques, including artificial neural network (ANN), random forests (RF), and kernel support vector machine (KSVM), were used to determine the ability of age, gender, body mass index (BMI), and race to predict OSA status. A retrospective cohort of 17,448 subjects from sleep clinics within the international Sleep Apnea Global Interdisciplinary Consortium (SAGIC) was randomly split into training (n=10,469) and validation (n=6,979) sets. Model comparisons were performed using the area under the receiver operating curve (AUC). Trained models were compared with the STOP-BANG questionnaire in two prospective testing datasets: an independent clinic-based sample from SAGIC (n=1,613) and a community-based sample from the Sleep Heart Heath Study (SHHS; n=5,599). Results The AUCs [95% CI] of the machine learning models were significantly higher than logistic regression (0.61 [0.60, 0.62]) in both the training and validation datasets (ANN: 0.68 [0.66, 0.69], RF: 0.68 [0.67, 0.70], and KSVM: 0.66 [0.65, 0.67]). In the SAGIC testing sample, the ANN (0.70 [0.68-0.72]) and RF (0.70 [0.68-0.73]) models had similar AUC to the STOP-BANG (0.71 [0.68-0.72]). In the SHHS testing sample, the ANN (0.72 [0.71-0.74]) had similar AUC to the STOP-BANG (0.72 [0.70-0.73]). Interpretation OSA prediction tools using machine learning without patient-reported symptoms provide better diagnostic performance than logistic regression. In clinical and community-based samples, the symptomless ANN tool has similar diagnostic performance to a widely used prediction tool that includes patient symptoms. Machine learning-derived algorithms may have utility for widespread identification of OSA.