Bayesian clustering of mixed-type data with relevant variable identification

Nurul Afiqah Burhanuddin,Kamarulzaman Ibrahim,Mohd Bakri Adam,Norwati Mustapha,Hani Syahida Zulkafli
DOI: https://doi.org/10.1080/03610918.2024.2361135
2024-06-08
Communications in Statistics - Simulation and Computation
Abstract:This paper presents a Bayesian nonparametric model for clustering datasets with continuous, ordinal, and nominal variables. The ordinal and nominal variables are treated using the latent variables framework based on the multivariate probit and the multinomial probit models. Combining the continuous variables with the latent continuous variables allows us to jointly model a set of mixed-type variables via the Dirichlet process Gaussian mixture model. The use of hierarchical shrinkage prior on the component means leads to improved clustering performances and provides an intuitive way to identify relevant clustering variables. The numerical results on simulated and real data illustrate the applicability of the proposed model.
statistics & probability
What problem does this paper attempt to address?