How Many Clusters?: A Ying-Yang Machine Based Theory for a Classical Open Problem in Pattern Recognition

L Xu
DOI: https://doi.org/10.1109/icnn.1996.549130
2002-01-01
Abstract:Determination of the number of clusters in the classical mean square error (MSE) clustering analysis (e.g., by the well known k-mean algorithm) and determination of the number of Gaussians in a finite Gaussian mixture (e.g., by the EM algorithm) are well known model selection problems that take important roles in unsupervised pattern recognition. The problem has remained open for decades since there is no appropriate theory for solving it except for some heuristic techniques. This paper presents a theory for solving this problem based on the Ying-Yang machine-a Bayesian-Kullback learning scheme for unified learnings (Xu, 1995, 1996). By this theory, we obtain the criteria for selecting the correct number of clusters in the MSE clustering or in a Gaussian mixture. In addition, an automatic procedure is designed for a fast implementation of the selection. Experimental results are provided to demonstrate our success
What problem does this paper attempt to address?