Clustering Support Vector Machines for Unlabeled Data Classification

Juanying Xie,Chunxia Wang,Yan Zhang,Shuai Jiang
DOI: https://doi.org/10.1109/ictm.2009.5413037
2009-01-01
Abstract:Clustering support vector machines (CSVM) is proposed in this paper for unlabeled data classification. It is often for us to deal with a large number of data which are wholly unlabeled, e.g., classifying them, and it is impractical for us to label these data manually. Clustering algorithms can be used to generate labels for this kind of data. The global k-means clustering algorithm, the fast global k-means algorithm and another global k-means clustering algorithm using k-d trees are combined respectively with the statistical method F-distribution in our paper to generate labels for those wholly unlabeled data, and then the labeled data are trained with SVM for classification. Our proposed approach (CSVM) is tested on four different synthetically generated data sets, which was wholly unlabeled. The experiment results show that our CSVM is efficient to classify the wholly unlabeled data.
What problem does this paper attempt to address?