AuxMix: Semi-Supervised Learning with Unconstrained Unlabeled Data

Amin Banitalebi-Dehkordi,Pratik Gujjar,Yong Zhang

DOI: https://doi.org/10.48550/arXiv.2206.06959

2022-06-15

Abstract:Semi-supervised learning (SSL) has seen great strides when labeled data is scarce but unlabeled data is abundant. Critically, most recent work assume that such unlabeled data is drawn from the same distribution as the labeled data. In this work, we show that state-of-the-art SSL algorithms suffer a degradation in performance in the presence of unlabeled auxiliary data that does not necessarily possess the same class distribution as the labeled set. We term this problem as Auxiliary-SSL and propose AuxMix, an algorithm that leverages self-supervised learning tasks to learn generic features in order to mask auxiliary data that are not semantically similar to the labeled set. We also propose to regularize learning by maximizing the predicted entropy for dissimilar auxiliary samples. We show an improvement of 5% over existing baselines on a ResNet-50 model when trained on CIFAR10 dataset with 4k labeled samples and all unlabeled data is drawn from the Tiny-ImageNet dataset. We report competitive results on several datasets and conduct ablation studies.

Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the performance degradation of existing SSL algorithms when using unconstrained unlabeled data (i.e., auxiliary data) in semi - supervised learning (SSL). Specifically, most existing SSL methods assume that the unlabeled data comes from the same distribution as the labeled data. However, in practical applications, this is often not the case. The unlabeled data may come from different distributions, which can lead to a significant decline in model performance. For example, when using unlabeled data from different datasets, even if the labeled datasets are the same, the classification accuracy of the model may be greatly reduced. The authors of the paper propose a new problem framework - Auxiliary - SSL, and propose a new algorithm - AuxMix for this problem. This algorithm learns general features through self - supervised learning tasks to mask those auxiliary data that are not semantically similar to the labeled dataset, and regularizes the learning process by maximizing the entropy of predictions for dissimilar auxiliary samples. Experimental results show that AuxMix can effectively improve model performance when dealing with the label distribution mismatch problem caused by auxiliary data, especially when using unlabeled data from different datasets.

AuxMix: Semi-Supervised Learning with Unconstrained Unlabeled Data

Adaptive Semi-Supervised Mixup with Implicit Label Learning and Sample Ratio Balancing

Mixmatch: A holistic approach to semi-supervised learning

Semi-supervised learning by selective training with pseudo labels via confidence estimation

Scaling Up Semi-supervised Learning with Unconstrained Unlabelled Data

RegMixMatch: Optimizing Mixup Utilization in Semi-Supervised Learning

Boosting Semi-Supervised Learning with Contrastive Complementary Labeling

Meta-Semi: A Meta-learning Approach for Semi-supervised Learning.

Patch-Mixing Contrastive Regularization for Few-Label Semi-Supervised Learning

FMixCutMatch for Semi-Supervised Deep Learning.

Improve Semi-supervised Learning with Metric Learning Clusters and Auxiliary Fake Samples

Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data

Un-mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning

When Semi-Supervised Learning Meets Transfer Learning: Training Strategies, Models and Datasets.

ProMix: Combating Label Noise via Maximizing Clean Sample Utility

VCC-INFUSE: Towards Accurate and Efficient Selection of Unlabeled Examples in Semi-supervised Learning

MixDiff: Mixing Natural and Synthetic Images for Robust Self-Supervised Representations

On the Discriminability of Self-Supervised Representation Learning

ABC: Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning

Enhancing Sample Utilization Through Sample Adaptive Augmentation in Semi-Supervised Learning

ADT-SSL: Adaptive Dual-Threshold for Semi-Supervised Learning