Scalable Graph-Based Semi-Supervised Learning Through Sparse Bayesian Model.

Bingbing Jiang,Huanhuan Chen,Bo Yuan,Xin Yao
DOI: https://doi.org/10.1109/tkde.2017.2749574
IF: 9.235
2017-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Semi-supervised learning (SSL) concerns the problem of how to improve classifiers’ performance through making use of prior knowledge from unlabeled data. Many SSL methods have been developed to integrate unlabeled data into the classifiers based on either the manifold or cluster assumption in recent years. In particular, the graph-based approaches, following the manifold assumption, have achieved a promising performance in many real-world applications. However, most of them work well on small-scale data sets only and lack probabilistic outputs. In this paper, a scalable graph-based SSL framework through sparse Bayesian model is proposed by defining a graph-based sparse prior. Based on the traditional Bayesian inference technique, a sparse Bayesian SSL algorithm (SBS $^2$ L) is obtained, which can remove the irrelevant unlabeled samples and make probabilistic prediction for out-of-sample data. Moreover, in order to scale SBS $^2$ L to large-scale data sets, an incremental SBS $^2$ L (ISBS$^2$ L) is derived. The key idea of ISBS $^2$ L is employing an incremental strategy and sequentially selecting parts of unlabeled samples that contribute to the learning instead of using all available unlabeled samples directly. ISBS$^2$ L has lower time and space complexities than previous SSL algorithms with the use of all unlabeled samples. Extensive experiments on various data sets verify that our algorithms can achieve comparable classification effectiveness and efficiency with much better scalability. Finally, the generalization error bound is derived based on robustness analysis.
What problem does this paper attempt to address?