Clustering Ensemble Based on Sample’s Certainty

Ji Xia,Liu Shuaishuai,Zhao Peng,Li Xuejun,Liu Qiong
DOI: https://doi.org/10.1007/s12559-021-09876-z
IF: 4.89
2021-01-01
Cognitive Computation
Abstract:The objective of clustering ensemble is to fuse multiple base partitions (BPs) to find the underlying data structure. It has been observed that sample can change its neighbors in different BPs and different samples have different relationship stability of sample. This difference shows that samples may have different contributions to the detection of underlying data structure. In addition, clustering ensemble aims to integrate the inconsistent parts of BPs by initially extracting the consistent parts. However, the existing clustering ensemble methods treat all samples equally. They neither consider sample relationship stability nor whether sample belongs to the consistent result or the inconsistent result in BPs. To tackle these deficiencies, we introduce the certainty of a sample to qualify its neighbor relationship stability and propose a formula to calculate this certainty. Then, we develop a clustering ensemble algorithm based on the sample's certainty. It is based on the following idea: the neighbor relationship of cluster core in BPs is more stable, and different cluster cores usually do not form neighbor relationships in BPs. This idea forms the basis of the clustering ensemble process. According to the sample's certainty, this algorithm divides a dataset into two subsets: cluster core samples and cluster halo samples. Then, the proposed algorithm discovers a clear core structure using cluster core samples and gradually assigns cluster halo samples to the core structure. The experiments on six synthetic datasets illustrate how our algorithm works. This algorithm has excellent performance and outperforms twelve state-of-the-art clustering ensemble algorithms on twelve real datasets.
What problem does this paper attempt to address?