Samples selection in semi-supervised classification

Jiao Wang,Siwei Luo,LianWei Zhao
2006-01-01
Journal of Computational Information Systems
Abstract:In semi-supervised classification, labeled data are used together with unlabeled data to boost the performance of learning algorithms. Labeled data are significant during classification accuracy and should determine data which has to be chosen as the labeled data or the data that provide the most information. A method is proposed in this paper to solve this problem. Based on the assumption that data near the separating surface providing more information, the nearest data is selected as the separating surface to be labeled data, it is labeled from user, and run semi-supervised algorithm with all the labeled data to get a new separating surface. The process is repeated several times to get the final classification. Experimental results on synthetic data show that the new algorithm can get a satisfying classification results with this samples selection method.
What problem does this paper attempt to address?