Combining Clustering Coefficient-Based Active Learning and Semi-Supervised Learning on Networked Data

Xiaoqi He,Yangguang Liu,Bin Xu,Xiaogang Jin
DOI: https://doi.org/10.1109/iske.2010.5680858
2010-01-01
Abstract:Active learning and semi-supervised learning are both important techniques to improve the learned model using unlabeled data, when labeled data is difficult to obtain, and unlabeled data is available in large quantity and easy to collect. Combining active learning with a semi-supervised learning algorithm that uses Gaussian field and harmonic functions was suggested recently. This work showed that empirical risk minimization (ERM) could find the next instance to label effectively, but the computation time consumption with ERM was large. In the case where the data is graphical in nature, we can leverage the graph topological analysis to rapidly select instances that are likely to be good candidates for labeling. This paper describes a novel approach of using clustering coefficient metric to identify the best instance next to label. We experiment on the 20 newsgroups dataset with three binary classification tasks, the results show that clustering coefficient strategy has similar performance to ERM with less time consumption.
What problem does this paper attempt to address?