Abstract:Importance: Risk estimation is an integral part of cardiovascular care. Local recalibration of guideline-recommended models could address the limitations of existing tools. Objective: To provide a machine learning (ML) approach to augment the performance of the American Heart Association's Predicting Risk of Cardiovascular Disease Events (AHA-PREVENT) equations when applied to a local population while preserving clinical interpretability. Design, setting, and participants: This cohort study used a New England-based electronic health record cohort of patients without prior atherosclerotic cardiovascular disease (ASCVD) who had the data necessary to calculate the AHA-PREVENT 10-year risk of developing ASCVD in the event period (2007-2016). Patients with prior ASCVD events, death prior to 2007, or age 79 years or older in 2007 were subsequently excluded. The final study population of 95 326 patients was split into 3 nonoverlapping subsets for training, testing, and validation. The AHA-PREVENT model was adapted to this local population using the open-source ML model (MLM) Extreme Gradient Boosting model (XGBoost) with minimal predictor variables, including age, sex, and AHA-PREVENT. The MLM was monotonically constrained to preserve known associations between risk factors and ASCVD risk. Along with sex, race and ethnicity data from the electronic health record were collected to validate the performance of ASCVD risk prediction in subgroups. Data were analyzed from August 2021 to February 2024. Main outcomes and measures: Consistent with the AHA-PREVENT model, ASCVD events were defined as the first occurrence of either nonfatal myocardial infarction, coronary artery disease, ischemic stroke, or cardiovascular death. Cardiovascular death was coded via government registries. Discrimination, calibration, and risk reclassification were assessed using the Harrell C index, a modified Hosmer-Lemeshow goodness-of-fit test and calibration curves, and reclassification tables, respectively. Results: In the test set of 38 137 patients (mean [SD] age, 64.8 [6.9] years, 22 708 [59.5]% women and 15 429 [40.5%] men; 935 [2.5%] Asian, 2153 [5.6%] Black, 1414 [3.7%] Hispanic, 31 400 [82.3%] White, and 2235 [5.9%] other, including American Indian, multiple races, unspecified, and unrecorded, consolidated owing to small numbers), MLM-PREVENT had improved calibration (modified Hosmer-Lemeshow P > .05) compared to the AHA-PREVENT model across risk categories in the overall cohort (χ23 = 2.2; P = .53 vs χ23 > 16.3; P < .001) and sex subgroups (men: χ23 = 2.1; P = .55 vs χ23 > 16.3; P < .001; women: χ23 = 6.5; P = .09 vs. χ23 > 16.3; P < .001), while also surpassing a traditional recalibration approach. MLM-PREVENT maintained or improved AHA-PREVENT's calibration in Asian, Black, and White individuals. Both MLM-PREVENT and AHA-PREVENT performed equally well in discriminating risk (approximate ΔC index, ±0.01). Using a clinically significant 7.5% risk threshold, MLM-PREVENT reclassified a total of 11.5% of patients. We visualize the recalibration through MLM-PREVENT ASCVD risk charts that highlight preserved risk associations of the original AHA-PREVENT model. Conclusions and relevance: The interpretable ML approach presented in this article enhanced the accuracy of the AHA-PREVENT model when applied to a local population while still preserving the risk associations found by the original model. This method has the potential to recalibrate other established risk tools and is implementable in electronic health record systems for improved cardiovascular risk assessment.

Refined selection of individuals for preventive cardiovascular disease treatment with a transformer-based risk model

Refined selection of individuals for preventive cardiovascular disease treatment with a Transformer-based risk model

The potential of the transformer-based survival analysis model, SurvTrace, for predicting recurrent cardiovascular events and stratifying high-risk patients with ischemic heart disease

Improving Cardiovascular Risk Prediction Through Machine Learning Modelling of Irregularly Repeated Electronic Health Records

Development and validation of a model to predict cardiovascular death, nonfatal myocardial infarction, or nonfatal stroke in patients with type 2 diabetes mellitus and established atherosclerotic cardiovascular disease

A comparative study of model-centric and data-centric approaches in the development of cardiovascular disease risk prediction models in the UK Biobank

Tailoring Risk Prediction Models to Local Populations

Machine learning identifies individuals at higher risk of incident cardio-renal-metabolic diseases and cardiovascular death who have unrealised opportunities to reduce future cardiovascular risk

Development of an accessible 10-year Digital CArdioVAscular (DiCAVA) risk assessment: a UK Biobank study

Machine learning approaches improve risk stratification for secondary cardiovascular disease prevention in multiethnic patients

Cardiovascular risk prediction using metabolomic biomarkers and polygenic risk scores: a cohort study and modelling analyses

Development, validation and prospective clinical implementation of a machine learning algorithm for incident cardio-renal-metabolic diseases and cardiovascular death: the OPTIMISE study

Predicting the risk of subclinical atherosclerosis based on interpretable machine models in a Chinese T2DM population

Prediction of cardiovascular and renal risk among patients with apparent treatment‐resistant hypertension in the United States using machine learning methods

Development of machine learning-based models to predict 10-year risk of cardiovascular disease: a prospective cohort study

Development and validation of a new algorithm for improved cardiovascular risk prediction

Prediction of cardiovascular risk in patients type 2 diabetes using the SCORE2-Diabetes risk score

Deep Learning Based Cardiovascular Disease Risk Factor Prediction Among Type 2 Diabetes Mellitus Patients

Age and sex specific thresholds for risk stratification of cardiovascular disease and clinical decision making: prospective open cohort study

Prediction model for cardiovascular disease in patients with diabetes using machine learning derived and validated in two independent Korean cohorts