Novel semi-supervised clustering algorithm based on active data selection

WEN Ping,LENG Ming-wei,CHEN Xiao-yun
DOI: https://doi.org/10.3969/j.issn.1001-3695.2012.08.010
2012-01-01
Abstract:Semi-supervised clustering,which aims to significantly improve the clustering results using limited supervision,has inevitably been the research focus in data mining and machine learning in recent years.But the accuracy of existing semi-clustering algorithms is low when dealing with the datasets with little labeled data or the multi-density and unbalanced datasets.Based on the active learning,this paper studied the data selection and presented a novel semi-supervised clustering algorithm.It selected information-rich data as labeled data by combining the ideas of minimum spanning tree clustering and active lear-ning,and then used the KNN-like technology to propagate labels.Evaluating on several UCI standard datasets and synthetic datasets,the results show that the proposed method has manifest higher accuracy and stable performance in comparison with others,even when the datasets are multi-density and unbalanced.
What problem does this paper attempt to address?