Semi-Supervised Learning: Exploiting Unlabeled Data with Symmetrical Distribution and High Confidence

Yihao Zhang,Junhao Wen,Fangfang Tang,Zhuo Jiang
DOI: https://doi.org/10.1142/s0218001412510032
IF: 1.261
2012-01-01
International Journal of Pattern Recognition and Artificial Intelligence
Abstract:Current existing representative works to semi-supervised incremental learning prefer to select unlabeled instances predicted with high confidence for model retraining. However, this strategy may degrade the classification performance rather than improve it, because relying on high confidence for data selection can lead to an erroneous estimate to the true distribution, especially when the confidence annotator is highly correlated with the confidence annotator. In this paper, a new semi-supervised incremental learning algorithm was proposed, which selected the high confidence unlabeled instances with symmetrical distribution from unlabeled data, it can reduce the bias in the estimation in some degree. In detail, expectation maximization algorithm was used to estimate the confidence of each instance, and Gaussian function was used to calculate the data distribution, then the selected unlabeled data was used for retraining model with classifier algorithm. The experimental results based on a large number of UCI data sets show that our algorithm can effectively exploit unlabeled data to enhance the learning performance.
What problem does this paper attempt to address?