Abstract:Background Machine learning (ML) models provide potential advantage over 'traditional' regression models in heart failure (HF) prediction. Objective To compare performances of Cox PH models and ML survival models for incident HF in men and women without prevalent ischemic heart disease (IHD). We also aimed to identify potential high-risk precursors otherwise ignored by conventional survival models, and to investigate differences between sex-specific models. Methods We included 476,393 participants (55.6% women) from the UK Biobank, after excluding participants with a history of HF or IHD, and defined sex-specific datasets. We predicted incident HF events using over 400 baseline characteristics. We constructed multivariable Cox PH models, which included all predictor variables and subsequently only those remaining after LASSO stability selection. We also developed two supervised ML models (Random Survival Forest (RSF), eXtreme Gradient Survival Boosting (XGBoost)). We identified the 15 most important sex-specific predictors in each model and performances were compared using the C-index. Models were validated using hold-out sets. Results During 12.3 ± 1.9 years of follow-up, 4680 (1.76%) women and 6631 (3.14%) men developed incident HF. XGBoost showed the best performance during model training (C-index, training: 0.89 in men, 0.97 in women; validation 0.77 in men, 0.80 in women). The multivariable Cox model performed second-best (C-index, training: 0.78 in men, 0.82 in women; validation: 0.76 in men, 0.78 in women). RSF performed slightly worse (C-index, training: 0.75 in men, 0.79 in women; validation: 0.75 in men, 0.79 in women) but did not show performance drop during validation. LASSO stability selection performed similar to RSF. Age, self-reported lifetime treatments and medications, cystatin-C, waist circumference and FEV1-scores were identified as strong risk factors in all models for both sexes. Reduced albumin levels and elevated HbA1c were more strongly associated with high risk in men, while elevated systolic BP showed higher importance in women. Traditional Cox models observed CRP as important only in men, while the ML models identified CRP as important for both sexes. Neutrophil count was considered a strong risk factor in both sexes in the traditional Cox models, yet it was not among the most important predictors in both ML models. Presence of other heart disease (which included a.o. pericardial disease, valve disorders and arrhythmias) was an important predictor variable only in the ML models. Conclusion ML models showed similar performance to Cox PH models for HF prediction. Despite this, differences in predictor importance were identified between models. Sex-specific risk predictors were found, and FEV1 score, which is not commonly included in existing models, was identified as an important risk factor. These results suggest that ML models may reveal additional insights that would otherwise remain unnoticed.

A comparative study of model-centric and data-centric approaches in the development of cardiovascular disease risk prediction models in the UK Biobank

Cardiovascular Disease Risk Prediction Models in Chinese Population- A Systematic Review and Meta-Analysis

Development of machine learning-based models to predict 10-year risk of cardiovascular disease: a prospective cohort study

Improving Cardiovascular Risk Prediction Through Machine Learning Modelling of Irregularly Repeated Electronic Health Records

Sex-specific cardiovascular risk factors in the UK Biobank

Improving Cardiovascular Disease Prediction With Machine Learning Using Mental Health Data: A Prospective UK Biobank Study

Development of an accessible 10-year Digital CArdioVAscular (DiCAVA) risk assessment: a UK Biobank study

Machine Learning Models for the Identification of Cardiovascular Diseases Using UK Biobank Data

Improving Cardiovascular Disease Risk Prediction With Machine Learning Using Mental Health Data: A Prospective Uk Biobank Study

Actionable absolute risk prediction of atherosclerotic cardiovascular disease based on the UK Biobank

Enhancing Cardiovascular Disease Risk Prediction with Machine Learning Models

Predictive value of metabolic profiling in cardiovascular risk scores: analysis of 75 000 adults in UK Biobank

Exploring Predictive Methods for Cardiovascular Disease: A Survey of Methods and Applications

Predicting cardiovascular risk from national administrative databases using a combined survival analysis and deep learning approach

Machine Learning Models for Cardiovascular Disease Prediction: A Comparative Study

Integrated Machine Learning Model for Comprehensive Heart Disease Risk Assessment Based on Multi-Dimensional Health Factors

Comprehensive machine learning models for prediction of heart failure in 476,393 women and men from the UK Biobank reveal sex differences and underutilized risk factors

Development of a Model to Predict 10-Year Risk of Ischemic and Hemorrhagic Stroke and Ischemic Heart Disease Using the China Kadoorie Biobank

Independent external validation of the QRISK3 cardiovascular disease risk prediction model using UK Biobank

A Cox-Based Risk Prediction Model for Early Detection of Cardiovascular Disease: Identification of Key Risk Factors for the Development of a 10-Year CVD Risk Prediction

Multi-modality risk prediction of cardiovascular diseases for breast cancer cohort in the All of Us Research Program