Development of risk models of incident hypertension using machine learning on the HUNT study data

Filip Emil Schjerven,Emma Maria Lovisa Ingeström,Ingelin Steinsland,Frank Lindseth
DOI: https://doi.org/10.1038/s41598-024-56170-7
IF: 4.6
2024-03-09
Scientific Reports
Abstract:In this study, we aimed to create an 11-year hypertension risk prediction model using data from the Trøndelag Health (HUNT) Study in Norway, involving 17 852 individuals (20–85 years; 38% male; 24% incidence rate) with blood pressure (BP) below the hypertension threshold at baseline (1995–1997). We assessed 18 clinical, behavioral, and socioeconomic features, employing machine learning models such as eXtreme Gradient Boosting (XGBoost), Elastic regression, K-Nearest Neighbor, Support Vector Machines (SVM) and Random Forest. For comparison, we used logistic regression and a decision rule as reference models and validated six external models, with focus on the Framingham risk model. The top-performing models consistently included XGBoost, Elastic regression and SVM. These models efficiently identified hypertension risk, even among individuals with optimal baseline BP (< 120/80 mmHg), although improvement over reference models was modest. The recalibrated Framingham risk model outperformed the reference models, approaching the best-performing ML models. Important features included age, systolic and diastolic BP, body mass index, height, and family history of hypertension. In conclusion, our study demonstrated that linear effects sufficed for a well-performing model. The best models efficiently predicted hypertension risk, even among those with optimal or normal baseline BP, using few features. The recalibrated Framingham risk model proved effective in our cohort.
multidisciplinary sciences
What problem does this paper attempt to address?
This paper aims to develop an 11 - year risk prediction model for hypertension onset using data from the Nord - Trøndelag Health Study (HUNT study) in Norway. Specifically, the research team used data from 17,852 individuals (aged between 20 and 85 years, 38% male, and with a 24% incidence of hypertension) whose blood pressure was below the hypertension threshold at baseline (1995 - 1997). The study evaluated 18 clinical, behavioral, and socioeconomic characteristics and adopted machine - learning models including Extreme Gradient Boosting (XGBoost), Elastic Regression, K - Nearest Neighbor (KNN), Support Vector Machines (SVM), and Random Forest. For comparison, the study also used logistic regression and decision rules as reference models and verified six external models, especially the Framingham Risk Model. ### Main objectives of the study: 1. **Develop a risk prediction model for hypertension onset**: Use machine - learning methods to create a model based on HUNT study data that can predict an individual's risk of developing hypertension within the next 11 years. 2. **Evaluate model performance**: Evaluate the performance of the developed model by comparing it with traditional models (such as logistic regression and decision rules) and external models (such as the Framingham Risk Model). 3. **Identify important features**: Determine which features are most critical to model performance in order to simplify the model and improve its practicality and accuracy. 4. **External validation**: Validate existing hypertension risk models on HUNT study data to assess their generalization ability. ### Research background: - Hypertension is a medical condition characterized by persistently elevated blood pressure, which indirectly causes approximately 10 million deaths annually and accounts for 10% of the total global health resource expenditure. - In current hypertension management practices, intervention measures are mainly determined based on blood pressure measurement, age, and risk factors for other diseases. - Lifestyle modification is a key intervention in hypertension management at all stages and can effectively prevent and delay the onset of hypertension. - Developing a risk model that can detect individuals at high risk before they develop hypertension can initiate personalized prevention strategies earlier. ### Method overview: - **Data sources**: HUNT study data, including baseline data (1995 - 1997) and follow - up data (2006 - 2008). - **Feature selection**: 18 known risk factors for hypertension and cardiovascular diseases, including age, systolic and diastolic blood pressure, body mass index (BMI), height, family history of hypertension, etc. - **Model development**: Use multiple machine - learning methods, including XGBoost, Elastic Regression, SVM, KNN, and Random Forest. - **Model validation**: Evaluate model performance through internal validation (training set and test set) and external validation (other published risk models). ### Results: - **Model performance**: XGBoost, Elastic Regression, and SVM performed best and were able to effectively identify hypertension risk, even in individuals with normal or optimal baseline blood pressure. - **Feature importance**: Age, systolic and diastolic blood pressure, BMI, height, and family history of hypertension are the most important features. - **External model validation**: The recalibrated Framingham Risk Model performed excellently in the HUNT study cohort, approaching the best machine - learning models. ### Conclusions: - This study shows that linear effects are sufficient to construct high - performance risk prediction models. - The best model can efficiently predict hypertension risk, even in individuals with normal or optimal baseline blood pressure, and uses fewer features. - The recalibrated Framingham Risk Model performs well in this cohort and has high practical value.