Cluster Number Estimation by Adaptively Identifying Ambiguously Clustered Pairs

He Zhenfeng,Zhuang Yu
DOI: https://doi.org/10.1109/chicc.2016.7554487
2016-01-01
Abstract:Consensus matrix(CM) constructed in consensus clustering provides every instance pair with clustering stability information, so we can estimate cluster number by comparing the proportions of ambiguously clustered pairs in different CMs. This paper presents a hidden Markov model(HMM) based approach to identifying obscurely grouped pairs. The approach analyses an occurrence sequence gotten by counting the occurrence of each possible element value in an original CM. The sequence is treated as an observable sequence generated by a Markov chain of three states, which are separately clustered, ambiguously clustered, and jointly clustered respectively. So a constrained HMM based method is presented to segment it. The method uses Baum-Welch algorithm to learn a constrained HMM at first. Then it extends the learned model to introduce a minimal length constraint, and uses a modified Viterbi algorithm to extract an acceptable 3-subsequence segmentation. The pairs corresponding to the middle segment are identified as ambiguous. The HMM based approach is added into a general K-Means based consensus clustering framework to evaluate CM and estimate the cluster number. Experimental results on four UCI datasets suggest this approach is more effective than some recent approaches on estimating cluster number.
What problem does this paper attempt to address?