Multi-Scale Constrained Deep Embedded Clustering
Chongwei Xie,Senlin Luo,Jinjie Zhou,Chenggang Cui,Limin Pan
DOI: https://doi.org/10.2139/ssrn.4657422
2023-01-01
Abstract:Deep semi-supervised clustering methods aim to leverage neural network to acquire dependable feature representations from growing high-dimensional data, in which non-randomized and preferentially selected or automatic construction of constraints from the data is both essential and challenging. However, existing methods suffer from invalid and imbalanced constraints by randomly constructing pairwise constraints, paying little attention to samples located at clustering boundaries, leading to blurring of clustering boundaries and inadequate separation among clusters. Meanwhile, methods incorporating cluster size constraints based on expertise can result in severe misclassification of samples, which is exacerbated in methods without such constraints. In this paper, a multi-scale constrained deep embedded clustering model (MCDEC) is proposed, which comprehensively considers different constraints at three scales: sample-sample, sample-cluster, and cluster-cluster. Specifically, fuzzy clustering and uncertainty measurement are employed to construct pairwise constraints that are more informative and encompassed across all classes. Besides, selecting seed sets to initialize the cluster centers avoids centroid deviation. The construction of global size constraints that adapt to inter-cluster proportions, aligning closely with the real data distribution, enhances constraints between clusters. Experiments on various datasets empirically demonstrate that MCDEC outperforms the state-of-the-arts, reducing misclassified samples considerably while making clusters more compact and distant from each other.