Semi-Supervised Clustering with Partial Background Information.

Jing Gao,Pang-Ning Tan,Haibin Cheng
DOI: https://doi.org/10.1137/1.9781611972764.46
2006-01-01
Abstract:Incorporating background knowledge into unsupervised clustering algorithms has been the subject of extensive research in recent years. Nevertheless, existing algorithms implicitly assume that the background information, typically specified in the form of labeled examples or pairwise constraints, has the same feature space as the unlabeled data to be clustered. In this paper, we are concerned with a new problem of incorporating partial background knowledge into clustering, where the labeled examples have moderate overlapping features with the unlabeled data. We formulate this as a constrained optimization problem, and propose two learning algorithms to solve the problem, based on hard and fuzzy clustering methods. An empirical study performed on a variety of real data sets shows that our proposed algorithms improve the quality of clustering results with limited labeled examples.
What problem does this paper attempt to address?