A deep learning model for early risk prediction of heart failure with preserved ejection fraction by DNA methylation profiles combined with clinical features

Xuetong Zhao,Yang Sui,Xiuyan Ruan,Xinyue Wang,Kunlun He,Wei Dong,Hongzhu Qu,Xiangdong Fang
DOI: https://doi.org/10.1186/s13148-022-01232-8
2022-01-19
Clinical Epigenetics
Abstract:Abstract Background Heart failure with preserved ejection fraction (HFpEF), affected collectively by genetic and environmental factors, is the common subtype of chronic heart failure. Although the available risk assessment methods for HFpEF have achieved some progress, they were based on clinical or genetic features alone. Here, we have developed a deep learning framework, HFmeRisk, using both 5 clinical features and 25 DNA methylation loci to predict the early risk of HFpEF in the Framingham Heart Study Cohort. Results The framework incorporates Least Absolute Shrinkage and Selection Operator and Extreme Gradient Boosting-based feature selection, as well as a Factorization-Machine based neural network-based recommender system. Model discrimination and calibration were assessed using the AUC and Hosmer–Lemeshow test. HFmeRisk, including 25 CpGs and 5 clinical features, have achieved the AUC of 0.90 (95% confidence interval 0.88–0.92) and Hosmer–Lemeshow statistic was 6.17 ( P = 0.632), which outperformed models with clinical characteristics or DNA methylation levels alone, published chronic heart failure risk prediction models and other benchmark machine learning models. Out of them, the DNA methylation levels of two CpGs were significantly correlated with the paired transcriptome levels ( R < −0.3, P < 0.05). Besides, DNA methylation locus in HFmeRisk were associated with intercellular signaling and interaction, amino acid metabolism, transport and activation and the clinical variables were all related with the mechanism of occurrence of HFpEF. Together, these findings give new evidence into the HFmeRisk model. Conclusion Our study proposes an early risk assessment framework for HFpEF integrating both clinical and epigenetic features, providing a promising path for clinical decision making.
oncology,genetics & heredity
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to develop a method for early prediction of the risk of heart failure with preserved ejection fraction (HFpEF). Specifically, the authors aim to combine DNA methylation profiles and clinical features and use deep - learning models to improve the accuracy of early - risk prediction of HFpEF. ### Background and Problem Statement Heart failure (CHF) is a common chronic disease. According to the difference in left ventricular ejection fraction (LVEF), it can be divided into three subtypes: heart failure with reduced ejection fraction (HFrEF, LVEF ≤ 40%), heart failure with mid - range ejection fraction (HFmrEF, 40% < LVEF ≤ 50%), and heart failure with preserved ejection fraction (HFpEF, LVEF > 50%). Among them, HFpEF is the most common subtype, accounting for approximately 50% of all CHF patients. Although there are currently some risk - assessment methods for CHF, most of these methods are based on single clinical or genetic features and lack the integration of multi - omics data. ### Research Objectives To overcome the limitations of existing methods, the authors proposed a deep - learning framework named HFmeRisk, which combines 5 clinical features and 25 DNA methylation sites to predict the early risk of HFpEF. The main objectives of the study include: 1. **Develop an integrated model**: By combining DNA methylation and clinical features, construct a more accurate early - risk prediction model for HFpEF. 2. **Verify model performance**: Evaluate the discrimination and calibration of the HFmeRisk model and compare it with other existing risk - prediction models. 3. **Explore biological mechanisms**: Analyze the association between DNA methylation sites and gene expression and reveal the potential biological mechanisms of HFpEF. ### Method Overview - **Data sources**: Use the data of the Framingham Heart Study (FHS) Offspring Cohort, including clinical information, DNA methylation data, and gene expression data. - **Feature selection**: Use the LASSO and XGBoost algorithms for feature selection, and finally select 5 clinical features and 25 DNA methylation sites. - **Model construction**: Use the DeepFM (Factorization - Machine based neural network) algorithm to construct the HFmeRisk model, which can automatically learn the interaction of nonlinear features. - **Model evaluation**: Evaluate the performance of the model through indicators such as AUC (area under the receiver operating characteristic curve) and Hosmer - Lemeshow test, and perform decision - curve analysis to evaluate the clinical utility. ### Main Results - The AUC of the HFmeRisk model on the test set is 0.90 (95% CI 0.88 - 0.92), which is significantly better than the models using only clinical features or DNA methylation features. - The model performs well in calibration, with a Hosmer - Lemeshow statistic of 6.17 and a P - value of 0.632. - Decision - curve analysis shows that the HFmeRisk model has a higher net benefit in most threshold probability ranges. ### Conclusion This study proposed an early - risk assessment framework HFmeRisk that combines clinical and epigenetic features, providing a new approach for the early prediction of HFpEF and contributing to clinical decision - making.