Multinomial belief networks for healthcare data

H. C. Donker,D. Neijzen,J. de Jong,G. A. Lunter
2024-04-06
Abstract:Healthcare data from patient or population cohorts are often characterized by sparsity, high missingness and relatively small sample sizes. In addition, being able to quantify uncertainty is often important in a medical context. To address these analytical requirements we propose a deep generative Bayesian model for multinomial count data. We develop a collapsed Gibbs sampling procedure that takes advantage of a series of augmentation relations, inspired by the Zhou$\unicode{x2013}$Cong$\unicode{x2013}$Chen model. We visualise the model's ability to identify coherent substructures in the data using a dataset of handwritten digits. We then apply it to a large experimental dataset of DNA mutations in cancer and show that we can identify biologically meaningful clusters of mutational signatures in a fully data-driven way.
Machine Learning,Applications
What problem does this paper attempt to address?
This paper proposes a deep generative Bayesian model called Multinomial Belief Network (MBN) for healthcare data. Healthcare data often have sparsity, high missing rate, small sample size, and require handling uncertainty. Traditional maximum likelihood methods can produce biases in these cases. In the paper, the authors develop a folded Gibbs sampling method using a series of augmentation relations inspired by the Zhou-Cong-Chen model. MBN is able to identify coherent substructures in the data and demonstrates its capability on handwritten digits and cancer DNA mutation datasets, discovering biologically meaningful mutation feature clusters. Compared to traditional topic modeling approaches, MBN captures multi-layered topic interactions and is more suitable for handling limited and uncertain healthcare data.