An improved co-training style algorithm:Compatible Co-training

Xiangyu Guo,Wei? Wang
DOI: https://doi.org/10.13232/j.cnki.jnju.2016.04.011
2016-01-01
Abstract:Semi-supervised learning has been a popular direction of the machine learning field.It mainly focuses on utilizing unlabeled data to assist learning with labeled data.One of its major paradigms exploits the disagreement between multiple classifiers,and co-training may be the most classical representative of this paradigm.The co-training algorithm assumes a two-views setting,where it trains one classifier on each view,and let the two label new instances for each other iteratively to enlarge the training set.It has been proved that when both views are sufficient, the co-training algorithm can find the optimal classifiers on each view.In practice however,views may be corrupted due to feature degradation or noise,such that either view cannot provide enough information to perfectly determine an instance’s label.Under such situation,the two views’optimal classifiers may not be compatible any more,which means that some labels provided by one view’s classifier may be misleading for the other.To mitigate the effects due to view insufficiency,we propose an improved co-training algorithm named Compatible Co-training.It tries to auto-matically identify and eliminate the misleadingly labeled instances.During each iteration,the algorithm records labels assigned to newly labeled instances.Then the updated classifier predicts labels for all instance labeled by the other, and dynamically eliminate those with conflicting labels.Experiments show that in most cases Compatible Co-training generalizes better and converges faster when compared with the original co-training algorithm.Moreover,the Compatible Co-training is robust in the situation where two classifiers on each view has a large difference in initial accuracy,while co-training’s performance deteriorates significantly.
What problem does this paper attempt to address?