A new risk assessment model of venous thromboembolism by considering fuzzy population

Xin Wang,Yuqing Yang,Xinyu Hong,Sihua Liu,Jianchu Li,Ting Chen,Juhong Shi
DOI: https://doi.org/10.21203/rs.3.rs-2987619/v1
2023-01-01
Abstract:Abstract Background Inpatients with high risk of venous thromboembolism (VTE) usually face serious threats to their health and economic conditions. Many studies using machine learning (ML) models to predict VTE risk neglected the influence of ‘fuzzy population’, and achieved inferior results. The ‘fuzzy population’ phenomenon is common in clinic and it means that normal individuals can share the same or very similar characteristics with the patients and they are hard for doctors to differentiate, due to the low incidence rate and complex pathogenesis of diseases. Considering the effect of ‘fuzzy population’, our study aims to develop a new VTE risk assessment model suitable for Chinese medical inpatients. Methods Inpatients in the medical department of Peking Union Medical College Hospital (PUMCH) from January 2014 to June 2016 were collected. A new ML VTE risk assessment model was built through population splitting. First patients were classified into different groups based on values of feature vectors consisted of multiple VTE risk factors, then trustless groups were filtered out, and finally ML models were built on training data in unit of groups. Sensitivity and specificity of our method was compared with five ML models (support vector machine (SVM), random forest (RF), gradient boosting decision tree (GBDT), logistic regression (LR), and XGBoost) and the Padua model which was widely used in clinic. Results The ‘fuzzy population’ phenomenon was verified on the VTE dataset. Compared with the Padua model, the proposed model showed higher sensitivity (89.95% v.s. 84.66%) and specificity (67.86% v.s. 61.27%) on training data, and higher specificity (64.94% v.s. 63.30%) and the same sensitivity (90.24% v.s. 90.24%) on test data. Our model was more robust than other five ML models and its standard deviations of sensitivities and specificities were smaller. Besides, five ML models couldn’t simultaneously surpass the Padua’s sensitivity and specificity. Conclusions The proposed model achieved both higher sensitivity and specificity than the Padua model. Its robustness was better than traditional ML models. This study built a population-split-based ML model of VTE for Chinese medical inpatients by modeling the ‘fuzzy population’ and it can be applied more broadly in risk assessment of other diseases.
What problem does this paper attempt to address?