Hidden Markov Models on Variable Blocks with a Modal Clustering Algorithm and Applications

Lin Lin,Jia Li
DOI: https://doi.org/10.48550/arXiv.1606.08903
2016-06-29
Abstract:Motivated by high-throughput single-cell cytometry data with applications to vaccine development and immunological research, we consider statistical clustering in large-scale data that contain multiple rare clusters. We propose a new hierarchical mixture model, namely Hidden Markov Model on Variable Blocks (HMM-VB), and a new mode search algorithm called Modal Baum-Welch (MBW) for efficient clustering. Exploiting the widely accepted chain-like dependence among groups of variables in the cytometry data, we propose to treat the hierarchy of variable groups as a figurative time line and employ a HMM-type model, namely HMM-VB. We also propose to use mode-based clustering, aka modal clustering, and overcome the exponential computational complexity by MBW. In a series of experiments on simulated data HMM-VB and MBW have better performance than existing methods. We also apply our method to identify rare cell subsets in cytometry data and examine its strengths and limitations.
Methodology
What problem does this paper attempt to address?