Coronary Heart Disease Risk Prediction Using Binary Logistic Regression Based on Principal Component Analysis

M Fauzan Azhari,Farah Ayu Fitriani
DOI: https://doi.org/10.20885/enthusiastic.vol2.iss1.art6
2022-05-17
Abstract:Based on data from the World Health Organization (WHO), one type of heart disease namely coronary heart disease is the deadliest disease in the world. In 2016 at least 9,4 million people died caused by coronary heart disease. In Indonesia, deaths caused by heart disease, blood vessel (CVD), and respiratory disorders are the fourth highest in ASEAN (23,1%). Because of the danger of coronary heart disease, we need a system or model that can predict heart disease early, so that it can be treated early and can reduce the death rate caused by heart disease. This study uses principal component analysis (PCA) to make a linear combination of variables that have a high correlation so that the assumption of multicollinearity in the data can be resolved. For the prediction, this study uses binary logistic regression to predict heart disease based on existing factors. The result of the PCA there is 7 component variables with a total variance that can be explained as much as 72,9%. From the Bartlett test of the PCA data, the obtained p-value is 1 which means that there is no multicollinearity in the data. Predictive analysis using binary logistic regression based on PCA’s data was proven to increase the accuracy to 85%.
What problem does this paper attempt to address?