Active Semi-supervised K-Means Clustering Based on Silhouette Coefficient

Hongchen Guo,Junbang Ma,Zhiqiang Li
DOI: https://doi.org/10.1007/978-3-030-02804-6_27
2019-01-01
Abstract:To improve the effectiveness of semi-supervised clustering algorithm that may be influenced by the quality of labeled samples, researchers integrates active learning and semi-supervised clustering to guide the model learning. This paper presents an active semi-supervised k-means clustering model based on silhouette coefficient (SCKmeans). SCKmeans utilizes a pairwise constraint clustering method (PCKmeans) and actively selects valuable samples to establish constraints (query to oracle) based on silhouette coefficient. We iterate the model learning until the number of queries reaches a threshold or the clustering algorithm achieves an acceptable performance. SCKmeans optimizes the semi-supervised k-means by using Local Sample Density (LDS) sampling strategy in order to ensure the stability of the algorithm. In addition, a distance-based sampling method, which can reduce the queries quantity as well as increase the number of constraint samples, is introduced to optimize the process of establishing pairwise constraints. These two methods can promote the effectiveness of clustering algorithm significantly. We conduct considerable amount of experiments over various datasets and baselines, the experimental results indicate that our model has better performance with 5% and 6% boost in MI and ARI respectively.
What problem does this paper attempt to address?