Semi-supervised Clustering by Input Pattern Assisted Pairwise Similarity Matrix Completion

Jinfeng Yi,Lijun Zhang,Rong Jin,Qi Qian,Anil K. Jain
2013-01-01
Abstract:Many semi-supervised clustering algorithms have been proposed to improve the clustering accuracy by effectively exploring the available side information that is usually in the form of pairwise constraints. However, there are two main shortcomings of the existing semi-supervised clustering algorithms. First, they have to deal with non-convex optimization problems, leading to clustering results that are sensitive to the initialization. Second, none of these algorithms is equipped with theoretical guarantee regarding the clustering performance. We address these limitations by developing a framework for semisupervised clustering based on input pattern assisted matrix completion . The key idea is to cast clustering into a matrix completion problem, and solve it efficiently by exploiting the correlation between input patterns and cluster assignments. Our analysis shows that under appropriate conditions, only O (log n ) pairwise constraints are needed to accurately recover the true cluster partition. We verify the effectiveness of the proposed algorithm by comparing it to the state-of-the-art semisupervised clustering algorithms on several benchmark datasets.
What problem does this paper attempt to address?