A semi-supervised clustering approach using labeled data

A. Taghizabet,J. Tanha,A. Amini,J. Mohammadzadeh
DOI: https://doi.org/10.24200/sci.2022.58519.5772
2024-01-21
Scientia Iranica
Abstract:Over recent decades, there has been a growing interest in semi-supervised clustering. Compared to the supervised or unsupervised clustering methods for solving different real-life problems, reviewed articles show that semi-supervised clustering methods are more powerful, and even a small amount of supervised information can significantly improve the results of unsupervised methods. One popular method of incorporating partial supervised information is through labeled data. In this study, we propose a semi-supervised clustering algorithm called ConvexClust. The proposed method improves data clustering using a geometric view borrowed from the Lune concept in the connectivity index and 10% of labeled data. Clustering starts with the use of labeled data and the formation of a convex hull. It continues over the labeling of non-labeled data and the updating of the convex hull in an iterative process. Evaluations of three UCI datasets and sixteen artificial datasets show that the proposed method outperforms the other semi-supervised and traditional clustering techniques.
engineering, multidisciplinary
What problem does this paper attempt to address?