Semi-supervised Learning with Easy Labeled Data via Impartial Labeled Set Extension

Xuan Han,Mingyu You,Wanjing Ma
DOI: https://doi.org/10.1145/3607541.3616815
2023-01-01
Abstract:Traditional Semi-supervised Learning (SSL) methods usually assume that the labeled data is independent and identically distributed (i.i.d.) from the underlying distribution. However, several relevant researches have revealed that i.i.d. assumption may not always hold. Influenced by the human preference or automatic labeling, in some cases, the labels (or trusted labels) would be concentrated in the easy samples which have distinctive characteristics. Such a biased labeled set will lead to grave misestimating for the decision boundaries in learning process. In this paper, we proposed a novel evolutionary SSL framework, Solar Eclipse (SE), to address the problem. This framework is based on the concept of progressively enlarging the labeled set with the closest unlabeled samples. Specifically, a novel relative distance measurement Regional Label Propagation (R-LP) is designed. In R-LP, the sample space is divided into several regions according to the class similarities, and the distance is calculated independently in each region with label propagation. Such segregation strategy efficiently reduces the complicity of distance measurement in the feature space. Moreover, R-LP also facilitates the ensemble of different feature views. In our practice, an unbiased self-supervised feature view is introduced to assist the measurement. Experiments show that such dual-view scheme can help us find more reliable extending samples. The evaluation on the popular SSL benchmarks shows that the proposed SE framework achieves the most advanced performance with the easy labeled data. Except that, it also shows advantages when only a few i.i.d. labeled samples is provided, given that they may also have sampling bias.
What problem does this paper attempt to address?