K-means Based on Active Learning for Support Vector Machine.

Jie Gan,Ang Li,Qian-Lin Lei,Hao Ren,Yun Yang
DOI: https://doi.org/10.1109/icis.2017.7960089
2017-01-01
Abstract:in practice, unlabeled data can be cheaply and easily collected from target domain, but it is quite difficult and expensive to obtain a large amount of labeled data. Therefore how to use both of labeled and unlabeled data to improve the learning performance becomes critical issue for many real-world applications. Active Learning and Semi-supervised Learning are right solutions to such problem, and have been intensively studied from different perspectives. The former one advocates that learner is able to control the entire dataset and actively query the labels from the target dataset, the latter one tries to improve the learner's performance by using both of labeled and unlabeled instances at the same time. In this paper, we propose an Active Learning based SVM approach, KA-SVM. According to a cluster hypothesis, we use k-means to construct a pre-selection scheme, which obtains a subset of important instances as training set, then SVM can be optimally trained on such subset rather than entire one. Our approach has been generally evaluated on several benchmark datasets with comparison with other similar approaches, the experiment results demonstrate that our approach has the outstanding performance on both of classification accuracy and computation efficiency.
What problem does this paper attempt to address?