Machine Learning Models for Cardiovascular Disease Prediction: A Comparative Study

Chao Yan,Yiluan Xing,Sensen Liu,Erdi Gao,Jinyin Wang
DOI: https://doi.org/10.1101/2024.05.27.596092
2024-06-01
Abstract:Cardiovascular diseases (CVDs) pose a significant threat to global public health, affecting individuals across various age groups. Factors such as cholesterol levels, smoking, alcohol consumption, and physical inactivity contribute to their onset and progression. Enhancing our understanding of CVD etiology and informing targeted interventions for disease prevention and management remains a critical challenge. In this study, we address the task of predicting the likelihood of individuals developing CVDs using machine learning techniques. Specifically, we explore three approaches: the k-nearest neighbors (KNN) algorithm, logistic regression, and the random forest algorithm. Leveraging a comprehensive dataset sourced from Kaggle, encompassing 11 relevant factors, we conduct a series of experiments to identify the most influential predictors of CVDs. Our analysis aims not only to forecast disease occurrence but also to elucidate the primary determinants contributing to its manifestation. Through comparative analysis of the three methodologies, we demonstrate that the random forest algorithm exhibits superior performance in terms of predictive accuracy. This research represents a significant step towards leveraging machine learning techniques to enhance our understanding of CVD dynamics and inform targeted interventions for disease prevention and management.
Cancer Biology
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to predict the likelihood of individuals developing cardiovascular disease (CVD) using machine learning techniques. Specifically, the researchers compare three different machine learning methods—K-Nearest Neighbors (KNN), Logistic Regression, and Support Vector Machine (actually Random Forest)—to identify the most important predictors of cardiovascular disease. The main goal of the study is not only to predict the occurrence of the disease but also to reveal the main factors leading to its occurrence. The core contributions of the paper are as follows: 1. **Data Source**: The dataset used in the study comes from Kaggle and contains 11 relevant variables. 2. **Method Comparison**: The performance of KNN, Logistic Regression, and Random Forest methods was compared through experiments. 3. **Result Analysis**: The results show that the Random Forest algorithm performs best in terms of prediction accuracy. 4. **Clinical Application**: The study's findings help improve the understanding of cardiovascular disease dynamics and provide targeted interventions for prevention and management. In summary, this study aims to enhance the understanding of cardiovascular disease risk through machine learning techniques and provide robust data support for clinical decision support systems.