Costra: Confidence-based self-training

Shengjun Cheng,Qingcheng Huang,Jiafeng Liu,Xianglong Tang
DOI: https://doi.org/10.12733/jcis7941
2013-01-01
Journal of Computational Information Systems
Abstract:Self-training is a simple semi-supervised learning algorithm which iteratively refines a classifier through using predictions on unlabeled data. In traditional self-training, confidence of predictions on unlabeled data is measured simply by classifiers' posteriori outputs, which may be undesirable since the initial classifier trained on sparse labeled data has only mediocre accuracy. Moreover, classification noise may be inevitably accumulated and amplified as the iteration goes. Therefore, performance of self-training is usually unstable. In this paper, a novel self-training algorithm named Confidence-based Self-Training, CoSTra, is proposed, which exploits manifold assumption to facilitate the self-labeling process. In detail, a specific graph-based method with noise tolerance is utilized to help generate reliable predictions on unlabeled data. Further, in order to avoid introducing undesirable classification noise, certain mechanism is adopted to sequentially augment the training set. Empirical results demonstrate that CoSTra can effectively improve classification performance. Copyright © 2013 Binary Information Press.
What problem does this paper attempt to address?