Mixture of Coupled HMMs for Robust Modeling of Multivariate Healthcare Time Series

Onur Poyraz,Pekka Marttinen
2023-11-14
Abstract:Analysis of multivariate healthcare time series data is inherently challenging: irregular sampling, noisy and missing values, and heterogeneous patient groups with different dynamics violating exchangeability. In addition, interpretability and quantification of uncertainty are critically important. Here, we propose a novel class of models, a mixture of coupled hidden Markov models (M-CHMM), and demonstrate how it elegantly overcomes these challenges. To make the model learning feasible, we derive two algorithms to sample the sequences of the latent variables in the CHMM: samplers based on (i) particle filtering and (ii) factorized approximation. Compared to existing inference methods, our algorithms are computationally tractable, improve mixing, and allow for likelihood estimation, which is necessary to learn the mixture model. Experiments on challenging real-world epidemiological and semi-synthetic data demonstrate the advantages of the M-CHMM: improved data fit, capacity to efficiently handle missing and noisy measurements, improved prediction accuracy, and ability to identify interpretable subsets in the data.
Machine Learning,Applications
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the challenges in multivariate medical time - series data analysis, specifically including: 1. **Irregular Sampling**: The time intervals for collecting medical data may be inconsistent. 2. **Noise and Missing Values**: There is noise in the data, and some data may be missing. 3. **Heterogeneous Patient Populations**: Different patients have different dynamic characteristics, which violates the exchangeability assumption. 4. **Interpretability and Uncertainty Quantification**: In the medical field, the interpretability of the model and the quantification of uncertainty are crucial. To address these challenges, the author proposes a new model - a hybrid model of the Coupled Hidden Markov Model (M - CHMM) and shows how it elegantly overcomes these problems. Specifically, M - CHMM can: - Fit the data better. - Effectively handle missing and noisy measurements. - Improve prediction accuracy. - Identify interpretable subsets in the data. In addition, the author also proposes two new algorithms for sampling hidden variables in CHMM, namely Particle Filtering (PF) and Factorized Approximation (fFFBS). These two algorithms have higher computational feasibility, better mixing performance, and the ability to allow likelihood estimation compared to existing methods, which are necessary for learning hybrid models. ### Formula Summary 1. **Probability Definition of CHMM**: \[ p(\pi_C^{1:T}, x_C^{1:T})=\left(\prod_{c \in C} p(\pi_c^1)\right)\left(\prod_{t = 2}^T\prod_{c \in C} p(\pi_c^t|\pi_C^{t - 1})\right)\left(\prod_{t = 1}^T\prod_{c \in C} p(x_c^t|\pi_c^t)\right) \] 2. **Probability Definition of M - CHMM**: \[ p(x_n)=\sum_{m = 1}^M\gamma_m L_{\text{CHMM}}(x_n|\theta_{\text{CHMM}}^m) \] where \(\gamma\) is the mixing coefficient, satisfying \(\sum_{m = 1}^M\gamma_m = 1\) and \(0\leqslant\gamma_m\leqslant1\). 3. **Design of Transition Matrix**: \[ \mu_c^t=\beta_c\leftarrow c_0+\sum_{\hat{c}\in C\setminus c}\sum_{k\in K}\beta_c\leftarrow\hat{c}_k I[\pi_{\hat{c}}^{t - 1}=k] \] \[ \tau_c^t=\sigma_{\text{row}}(\mu_c^t) \] Through these improvements, M - CHMM can not only better handle complex multivariate medical time - series data, but also provide more accurate and interpretable results.