Abstract:BackgroundIdentifying groups at high risk of coronary heart disease(CHD) is important to reduce mortality due to CHD. Although machine learning methods have been introduced, many require laboratory or imaging parameters, which are not always readily available; thus, their wide applications are limited.ObjectiveThe aim of this study was to develop and validate a simple, efficient, and joint machine learning model for identifying individuals at high risk of CHD using easily obtainable nonlaboratory parameters.MethodsThis prospective study used data from the Henan Rural Cohort Study, which was conducted in rural areas of Henan Province, China, between July 2015 and September 2017. A joint machine learning model was developed by selecting and combining four base machine learning algorithms, including logistic regression(LR), artificial neural network (ANN), random forest (RF), and gradient boosting machine (GBM). We used readily accessible variables, including demographics, medical and family history, lifestyle and dietary factors, and anthropometric data, to inform the model. The model was also externally validated by a cohort of individuals from the Dongfeng-Tongji cohort study. Model discrimination was assessed by using the area under the receiver operating characteristic curve (AUC), and calibration was measured by using the Brier score(BS).ResultsA total of 38 716 participants (mean [SD] age, 55.64[12.19] years; 23449[60.6%] female) from the Henan Rural Cohort Study and 17 958 subjects (mean [SD] age, 62.74 [7.59] years; 10076 [56.1%] female) from the Dongfeng-Tongji cohort study were included in the analysis. Age, waist circumference, pulse pressure, heart rate, family history of CHD, education level, family history of type 2 diabetes mellitus(T2DM), and family history of dyslipidaemia were strongly associated with the development of CHD. In regard to internal validation, the model we built demonstrated good discrimination (AUC,0.844(95% CI 0.828-0.860)) and had acceptable calibration(BS, 0. 066). In regard to external validation, the model performed well with clearly useful discrimination(AUC,0.792(95% CI 0.774-0.810)), and robust calibration (BS, 0.069).ConclusionsIn this study, the novel and simple, machine learning-based model comprising readily accessible variables accurately identified individuals at high risk of CHD. This model has the potential to be widely applied for large-scale screening of CHD populations, especially in medical resource-constrained settings.Trial RegistrationThe Henan Rural Cohort Study has been registered at the Chinese Clinical Trial Register. (Trial registration: ChiCTR-OOC-15006699. Registered 6 July 2015 - Retrospectively registered) http://www.chictr.org.cn/showproj.aspx?proj=11375

A Machine Learning Model Based on Genetic and Traditional Cardiovascular Risk Factors to Predict Premature Coronary Artery Disease

Comparison of Machine Learning Models and Framingham Risk Score for the prediction of the presence and severity of Coronary Artery Diseases by using Gensini Score

A machine learning-based approach for the prediction of periprocedural myocardial infarction by using routine data

Development and Validation of a Predictive Model for Coronary Artery Disease Using Machine Learning

Improving Cardiovascular Risk Prediction Through Machine Learning Modelling of Irregularly Repeated Electronic Health Records

Using machine learning-based algorithms to construct cardiovascular risk prediction models for Taiwanese adults based on traditional and novel risk factors

Comparing the performance of machine learning and conventional models for predicting atherosclerotic cardiovascular disease in a general Chinese population

Use machine learning models to identify and assess risk factors for coronary artery disease

Construction and Validation of a Predictive Model for Coronary Artery Disease Using Extreme Gradient Boosting

Development of machine learning-based models to predict 10-year risk of cardiovascular disease: a prospective cohort study

Machine learning-aided risk stratification system for the prediction of coronary artery disease

Machine learning for the prediction of atherosclerotic cardiovascular disease during 3-year follow up in Chinese type 2 diabetes mellitus patients

Risk factors for high CAD-RADS scoring in CAD patients revealed by machine learning methods: a retrospective study

Study on the risk of coronary heart disease in middle-aged and young people based on machine learning methods: a retrospective cohort study

Nonlaboratory-based risk assessment model for coronary heart disease screening: Model development and validation

Machine learning-based prediction of composite risk of cardiovascular events in patients with stable angina pectoris combined with coronary heart disease: development and validation of a clinical prediction model for Chinese patients

Machine Learning to Predict Long-Term Cardiac-Relative Prognosis in Patients With Extra-Cardiac Vascular Disease

Deep Phenotyping and Prediction of Long-term Cardiovascular Disease: Optimized by Machine Learning

Machine learning to predict hemodynamically significant CAD based on traditional risk factors, coronary artery calcium and epicardial fat volume

Machine Learning for Early Prediction of Major Adverse Cardiovascular Events After First Percutaneous Coronary Intervention in Patients With Acute Myocardial Infarction: Retrospective Cohort Study

Machine learning improves mortality prediction in three-vessel disease