Machine learning prediction of mortality in venous thromboembolism patients: the Birmingham Black Country Venous Thromboembolism (BBC-VTE) cohort
W El-Bouri,A Sanders,G Y H Lip,BBC-VTE Investigators,
DOI: https://doi.org/10.1093/eurheartj/ehab724.3059
IF: 39.3
2021-10-01
European Heart Journal
Abstract:Abstract Introduction Venous thromboembolism (VTE), including deep vein thrombosis (DVT) and pulmonary embolism (PE), is one of the main causes of preventable death in hospitals in the UK. Current clinical risk scores to predict mortality of patients with VTE are the pulmonary embolism severity index (PESI) and the simplified PESI (sPESI) which have similar predictive power. Purpose To evaluate the ability of machine learning algorithms to predict mortality in patients admitted with VTE and to compare their predictive capability with the sPESI score for 30-day mortality. Methods The BBC-VTE was a retrospective multicentre patient cohort established to determine clinical features and novel aspects of risk prediction for VTE (and VTE-related complications) in a contemporary cohort. We include a cohort of 1554 patients (mean age 65.6 years; 53% female) who represent all consecutive admissions with a final diagnosis of VTE to one of 3 regional hospitals in the West Midlands, UK during the years 2012–2014. The dataset was split into training (70%) and validation (30%) cohorts. We trained two tree-based models, Random Forests (RF) and XGBoost (XG), using 5-fold cross-validation on the training cohort to predict patient mortality. This was validated using the held-out validation cohort and compared to a simple logistic regression model. To provide a comparison with the sPESI score, we extracted a sub-group of patients (n=652) who had values for oxygen saturation, systolic blood pressure, heart rate, history of cancer, history of cardiopulmonary disease, and age. We used RF to determine the mortality prediction using: i) only the sPESI variables listed and; ii) all the clinical variables available to us. This was then compared against the standard sPESI prediction for this cohort. C-indices (AUC) were used for comparison. Results The c-indices for RF and XG using the full patient cohort were 0.85 [95% CI: 0.80 – 0.90] (Fig. 1a) and 0.82 [95% CI: 0.77 - 0.87], with the logistic regression c-index being 0.83 [95% CI: 0.78 – 0.88]. The reported sPESI c-index was significantly smaller (p<0.05) than the RF c-index (0.75 [95% CI: 0.69–0.80]). The most important features for prediction of mortality indicated by the RF algorithm are age, admission blood levels, discharge oral anticoagulation, and previous malignancy (Fig. 2). The sPESI score c-index for the subgroup of patients was found to be 0.72. In comparison, using RF with the same variables gives a significantly larger (p<0.05) c-index of 0.78 [95% CI: 0.73 – 0.83]. When using all clinical variables available the c-index increased to 0.85 [95% CI: 0.80 – 0.90] (Fig. 1b). Conclusion Application of machine learning using simple clinical variables in hospital settings can improve prediction of mortality post-VTE event above-and-beyond the current simplified PESI risk score. Prospective study is warranted to validate the algorithm on external datasets and to construct individualised risk predictions. Funding Acknowledgement Type of funding sources: None. Figure 1. ROC curve comparisons with sPESIFigure 2. Feature Importances
cardiac & cardiovascular systems
What problem does this paper attempt to address?