Machine Learning Outperforms Traditional Logistic Regression and Offers New Possibilities for Cardiovascular Risk Prediction: A Study Involving 143,043 Chinese Patients with Hypertension

Yang Xi,Hongyi Wang,Ningling Sun
DOI: https://doi.org/10.3389/fcvm.2022.1025705
IF: 3.6
2022-01-01
Frontiers in Cardiovascular Medicine
Abstract:IntroductionIdentifying people at risk of cardiovascular diseases (CVD) is a cornerstone of preventive cardiology. We developed machine learning (ML) algorithms and investigated their performance in predicting patients’ current CVD risk (coronary heart disease and stroke in this study).Materials and methodsWe compared traditional logistic regression (LR) with five ML algorithms LR with Elastic-Net, Random Forest (RF), XGBoost (XGB), Support Vector Machine, Deep Learning, and an Ensemble model averaging predictions from RF, XGB, and Deep Learning for CVD risk prediction using pre-existing patient-level data from a multi-center, cross-sectional study (the Microalbuminuria Screening in Hypertensive Patients Project initiated by the China International Exchange and Promotive Association for Medical and Healthcare) that enrolled 143,043 patients with hypertension from 600 tertiary, secondary, or community hospitals. Each of the five ML algorithms incorporated 18 variables, such as demographics, examinations, comorbidities, and treatment regimens, and were trained and evaluated using 5-fold cross-validation. Predictive accuracy was assessed by the area under the receiver operating curve (AUROC).ResultsPatients’ mean age was 62 ± 12 years and 57% were men. Advanced ML algorithms outperformed the traditional LR model. Particularly, the Ensemble model had superior discrimination with an AUROC of 0.760 than LR (AUC = 0.737) and other tested models.ConclusionWe establishes an Ensemble model that shows better performance in predicting patients’ current CVD risk using routine information compared to the traditional LR model. ML can help physicians design follow-up plans with more accurate results, offering new possibilities for short-term risk prediction and early detection. Further, ML models can be trained with longitudinal data and used to predict long-term CVD risks, thereby informing CVD prevention.
What problem does this paper attempt to address?