Bayesian Clustering with Variable and Transformation Selections

jun s liu,junni l zhang,michael j palumbo,charles e lawrence
2003-01-01
Abstract:The clustering problem has attracted much attention from both statisticians and computer scientists in the past 50 years. Methods such as hierarchical clustering and the K-means method are convenient and competitive first choices off the shelf for the scientist. Gaussian mixture modelling is another popular but computationally expensive clustering strategy. especially when the data are high-dimensional. We propose to first conduct a principal component analysis (PCA) or correspondence analysis (CA) for dimension reduction, and then fit Gaussian mixtures to the data projected to the several major PCA or CA directions. Two technical difficulties of this approach are: (a) the selection of a subset of the PCA factors that are informative for clustering, and (b) the selection of a proper transformation for each factor. We propose a Bayesian formulation and Markov chain Monte Carlo strategies that overcome the two difficulties and examine the performances of the new method by both simulation studies and real applications in molecular imaging analysis and DNA microarray analysis.
What problem does this paper attempt to address?