A Clustering Method Based on Dirichlet Process Mixture Model

张林,刘辉
2012-01-01
Abstract:The number of clusters should be determined in advance when a finite mixture model is built to cluster high dimensional data, which deteriorates the precision and generalization of clustering. A Dirichlet process infinite mixture model was built to cluster high dimensional data in this paper. Based on Urn model, the posterior distributions of each parameter were derived. All parameters, including the number of potential clusters were estimated through Gibbs sam- pling MCMC method. The clustering results on both simulation dataset and IRIS dataset show that this method can correctly estimate the number of potential clusters after 200 Gibbs sampling MCMC iterations. The average time of iteration for simulation and IRIS datasets were 0. 1850 s and 0. 1455 s, respectively, and the time complexity of each iteration was O(N), where N is the number of sample.
What problem does this paper attempt to address?