Combining multimorbidity clustering with limited demographic information enables high-precision outcome predictions

Fabio S. Ferreira,Erwann Le Lannou,Benjamin Post,Shlomi Haar,Balasundaram Kadiverlu,Stephen J. Brett,Aldo A. Faisal
DOI: https://doi.org/10.1101/2024.05.28.24308024
2024-05-28
Abstract:Multimorbidity, the coexistence of multiple health conditions in individuals, is prevalent and increasing worldwide, proving to be a growing challenge for patients and the healthcare systems. Furthermore, the prevalence of multimorbidity contributes to an increased risk of hospital admission or even death. In this study, we employ a principled approach that utilises longitudinal data routinely collected in electronic health records linked to half a million people from the UK biobank to generate digital comorbidity fingerprints (DCFs) using a topic modelling approach, Latent Dirichlet Allocation. These comorbidity fingerprints summarise a patient's full secondary care clinical history, i.e. their comorbidities and past interventions. We identified 18 clinically relevant DCFs, which captured nuanced combinations of diseases and risk factors, e.g. grouping cardiovascular disorders with common risk factors but also novel groupings that are not obvious and differ in both their breadth and depth from existing observational disease associations. The DCFs, combined with demographic characteristics, performed on par or outperformed traditional models of all-cause mortality or hospital admission, showcasing the potential of data-driven strategies in healthcare forecasting. The comorbidity fingerprints together with age and number of hospital admissions were shown to be the most important factors in the predictions. Additionally, our DCF approach showed robust and consistent performance over time. Our findings underscore the promising role of interpretable data-driven approaches in healthcare forecasting, suggesting improved risk profiling for individual clinical decisions and targeted public health interventions, with consistent and robust performance over time.
What problem does this paper attempt to address?