Abstract:Semi-supervised learning (SSL) commonly exhibits confirmation bias, where models disproportionately favor certain classes, leading to errors in predicted pseudo labels that accumulate under a self-training paradigm. Unlike supervised settings, which benefit from a rich, static data distribution, SSL inherently lacks mechanisms to correct this self-reinforced bias, necessitating debiased interventions at each training step. Although the generation of debiased pseudo labels has been extensively studied, their effective utilization remains underexplored. Our analysis indicates that data from biased classes should have a reduced influence on parameter updates, while more attention should be given to underrepresented classes. To address these challenges, we introduce TaMatch, a unified framework for debiased training in SSL. TaMatch employs a scaling ratio derived from both a prior target distribution and the model's learning status to estimate and correct bias at each training step. This ratio adjusts the raw predictions on unlabeled data to produce debiased pseudo labels. In the utilization phase, these labels are differently weighted according to their predicted class, enhancing training equity and minimizing class bias. Additionally, TaMatch dynamically adjust the target distribution in response to the model's learning progress, facilitating robust handling of practical scenarios where the prior distribution is unknown. Empirical evaluations show that TaMatch significantly outperforms existing state-of-the-art methods across a range of challenging image classification tasks, highlighting the critical importance of both the debiased generation and utilization of pseudo labels in SSL.

Learning Label Refinement and Threshold Adjustment for Imbalanced Semi-Supervised Learning

Class-Imbalanced Semi-Supervised Learning with Adaptive Thresholding.

An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised Learning

Learning sample-aware threshold for semi-supervised learning

LaSSL: Label-Guided Self-Training for Semi-supervised Learning

SemiReward: A General Reward Model for Semi-supervised Learning

Leveraging Local Variance for Pseudo-Label Selection in Semi-supervised Learning

A Survey of Class-Imbalanced Semi-Supervised Learning

In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning

Towards the Mitigation of Confirmation Bias in Semi-supervised Learning: a Debiased Training Perspective

Roll With the Punches: Expansion and Shrinkage of Soft Label Selection for Semi-supervised Fine-Grained Learning

Towards Self-Adaptive Pseudo-Label Filtering for Semi-Supervised Learning

Erasing the Bias: Fine-Tuning Foundation Models for Semi-Supervised Learning

Improving Barely Supervised Learning by Discriminating Unlabeled Samples with Super-Class

Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning

CDMAD: Class-Distribution-Mismatch-Aware Debiasing for Class-Imbalanced Semi-Supervised Learning

On Pseudo-Labeling for Class-Mismatch Semi-Supervised Learning

A Channel-ensemble Approach: Unbiased and Low-variance Pseudo-labels is Critical for Semi-supervised Classification

DCRP: Class-Aware Feature Diffusion Constraint and Reliable Pseudo-Labeling for Imbalanced Semi-Supervised Learning

Boosting Semi-Supervised Learning under Imbalanced Regression Via Pseudo-Labeling

Robust Pseudo-Label Selection for Holistic Semi-Supervised Learning