Abstract:BackgroundIdentifying groups at high risk of coronary heart disease(CHD) is important to reduce mortality due to CHD. Although machine learning methods have been introduced, many require laboratory or imaging parameters, which are not always readily available; thus, their wide applications are limited.ObjectiveThe aim of this study was to develop and validate a simple, efficient, and joint machine learning model for identifying individuals at high risk of CHD using easily obtainable nonlaboratory parameters.MethodsThis prospective study used data from the Henan Rural Cohort Study, which was conducted in rural areas of Henan Province, China, between July 2015 and September 2017. A joint machine learning model was developed by selecting and combining four base machine learning algorithms, including logistic regression(LR), artificial neural network (ANN), random forest (RF), and gradient boosting machine (GBM). We used readily accessible variables, including demographics, medical and family history, lifestyle and dietary factors, and anthropometric data, to inform the model. The model was also externally validated by a cohort of individuals from the Dongfeng-Tongji cohort study. Model discrimination was assessed by using the area under the receiver operating characteristic curve (AUC), and calibration was measured by using the Brier score(BS).ResultsA total of 38 716 participants (mean [SD] age, 55.64[12.19] years; 23449[60.6%] female) from the Henan Rural Cohort Study and 17 958 subjects (mean [SD] age, 62.74 [7.59] years; 10076 [56.1%] female) from the Dongfeng-Tongji cohort study were included in the analysis. Age, waist circumference, pulse pressure, heart rate, family history of CHD, education level, family history of type 2 diabetes mellitus(T2DM), and family history of dyslipidaemia were strongly associated with the development of CHD. In regard to internal validation, the model we built demonstrated good discrimination (AUC,0.844(95% CI 0.828-0.860)) and had acceptable calibration(BS, 0. 066). In regard to external validation, the model performed well with clearly useful discrimination(AUC,0.792(95% CI 0.774-0.810)), and robust calibration (BS, 0.069).ConclusionsIn this study, the novel and simple, machine learning-based model comprising readily accessible variables accurately identified individuals at high risk of CHD. This model has the potential to be widely applied for large-scale screening of CHD populations, especially in medical resource-constrained settings.Trial RegistrationThe Henan Rural Cohort Study has been registered at the Chinese Clinical Trial Register. (Trial registration: ChiCTR-OOC-15006699. Registered 6 July 2015 - Retrospectively registered) http://www.chictr.org.cn/showproj.aspx?proj=11375

Construction and Validation of a Predictive Model for Coronary Artery Disease Using Extreme Gradient Boosting

Development and Validation of a Predictive Model for Coronary Artery Disease Using Machine Learning

Prediction of presence and severity of coronary artery disease using prediction for atherosclerotic cardiovascular disease risk in China scoring system

Improving Cardiovascular Risk Prediction Through Machine Learning Modelling of Irregularly Repeated Electronic Health Records

Using machine learning-based algorithms to construct cardiovascular risk prediction models for Taiwanese adults based on traditional and novel risk factors

Improvement of the Performance of Models for Predicting Coronary Artery Disease Based on XGBoost Algorithm and Feature Processing Technology

Study on the risk of coronary heart disease in middle-aged and young people based on machine learning methods: a retrospective cohort study

Machine learning to predict hemodynamically significant CAD based on traditional risk factors, coronary artery calcium and epicardial fat volume

A Machine Learning Model Based on Genetic and Traditional Cardiovascular Risk Factors to Predict Premature Coronary Artery Disease

Machine learning models using symptoms and clinical variables to predict coronary artery disease on coronary angiography

Use machine learning models to identify and assess risk factors for coronary artery disease

Nonlaboratory-based risk assessment model for coronary heart disease screening: Model development and validation

Comparing the performance of machine learning and conventional models for predicting atherosclerotic cardiovascular disease in a general Chinese population

Machine learning-based models for prediction of the risk of stroke in coronary artery disease patients receiving coronary revascularization

Machine learning-based prediction of composite risk of cardiovascular events in patients with stable angina pectoris combined with coronary heart disease: development and validation of a clinical prediction model for Chinese patients

Risk Prediction of Major Adverse Cardiovascular Events Occurrence Within 6 Months After Coronary Revascularization: Machine Learning Study

Development, evaluation and validation of machine learning models to predict hospitalizations of patients with coronary artery disease within the next 12 months

Machine learning for the prediction of atherosclerotic cardiovascular disease during 3-year follow up in Chinese type 2 diabetes mellitus patients

[A pretest model of obstructive coronary artery disease based on machine learning: from the C-Strat study]

Prediction of Hidden Coronary Artery Disease Using Machine Learning in Patients With Acute Ischemic Stroke

Machine-learning based prediction of obstructive coronary artery disease using integrated submodules of clinical information, chest x-ray, and electrocardiography