Without Pain -- Clustering Categorical Data Using a Bayesian Mixture of Finite Mixtures of Latent Class Analysis Models

Gertraud Malsiner-Walli,Bettina Grün,Sylvia Frühwirth-Schnatter
DOI: https://doi.org/10.48550/arXiv.2407.05431
2024-07-08
Abstract:We propose a Bayesian approach for model-based clustering of multivariate categorical data where variables are allowed to be associated within clusters and the number of clusters is unknown. The approach uses a two-layer mixture of finite mixtures model where the cluster distributions are approximated using latent class analysis models. A careful specification of priors with suitable hyperparameter values is crucial to identify the two-layer structure and obtain a parsimonious cluster solution. We outline the Bayesian estimation based on Markov chain Monte Carlo sampling with the telescoping sampler and describe how to obtain an identified clustering model by resolving the label switching issue. Empirical demonstrations in a simulation study using artificial data as well as a data set on low back pain indicate the good clustering performance of the proposed approach, provided hyperparameters are selected which induce sufficient shrinkage.
Methodology
What problem does this paper attempt to address?