Abstract:This paper investigates a range of empirical risk functions and regularization methods suitable for self-training methods in semi-supervised learning. These approaches draw inspiration from various divergence measures, such as $f$-divergences and $\alpha$-Rényi divergences. Inspired by the theoretical foundations rooted in divergences, i.e., $f$-divergences and $\alpha$-Rényi divergence, we also provide valuable insights to enhance the understanding of our empirical risk functions and regularization techniques. In the pseudo-labeling and entropy minimization techniques as self-training methods for effective semi-supervised learning, the self-training process has some inherent mismatch between the true label and pseudo-label (noisy pseudo-labels) and some of our empirical risk functions are robust, concerning noisy pseudo-labels. Under some conditions, our empirical risk functions demonstrate better performance when compared to traditional self-training methods.

What problem does this paper attempt to address?

This paper investigates robust self-training methods in semi-supervised learning (SSL) by utilizing the f-divergence and α-Rényi divergence to design new empirical risk functions and regularization techniques. The main contributions include: 1. Propose novel risk functions based on different divergences, including the f-divergence and α-Rényi divergence. 2. Combine these risk functions with self-training methods such as pseudo-labeling and entropy minimization for SSL applications. 3. Provide upper bounds on ideal performance for certain divergences as distance measures, assuming access to true labels for all unlabeled data. 4. Provide empirical analysis demonstrating the performance of the new risk functions and regularizers in the presence of noisy pseudo-labels. The paper first introduces the background of SSL, highlighting the challenges faced by traditional supervised learning methods in the absence of labeled data. It then defines the f-divergence and α-Rényi divergence in detail and discusses their concepts of soft labels and hard labels in handling classification problems. Next, the paper proposes the divergence-based empirical risk function (DER) for supervised learning (SL) and SSL applications. In SL, DER measures the discrepancy between model predictions and the true label distribution, while in SSL, particularly with pseudo-labeling and entropy minimization methods, DER considers the joint distribution of labeled and unlabeled data. The paper also discusses the robustness of these new methods, particularly in dealing with inaccurate pseudo-labels (i.e., pseudo-labels that do not match the true labels). By adjusting certain divergences, estimates of performance upper bounds can be provided. For example, total variation distance, Le Cam distance, and Jensen-Shannon divergence have been proven to be robust in certain cases. Finally, the paper proposes two algorithms: Divergence-based Pseudo-labeling SSL (DP-SSL) and Divergence-based Entropy Minimization SSL (DEM-SSL). Experimental results are provided to demonstrate that these methods outperform traditional self-training methods in certain scenarios on different datasets, and exhibit greater robustness in the presence of noisy pseudo-labels or imbalanced data. In summary, this paper aims to improve self-training methods in semi-supervised learning by introducing new risk functions and regularization techniques to enhance their performance and robustness in handling a small amount of labeled data.

Robust Semi-supervised Learning via $f$-Divergence and $α$-Rényi Divergence

Equivalence of the Empirical Risk Minimization to Regularization on the Family of f-Divergences

Learning Invariant Representations and Risks for Semi-supervised Domain Adaptation

Leveraging Ensemble Diversity for Robust Self-Training in the Presence of Sample Selection Bias

$f$-Divergence Based Classification: Beyond the Use of Cross-Entropy

On Semi-supervised Estimation of Discrete Distributions under f-divergences

Semi-Supervised Empirical Risk Minimization: Using unlabeled data to improve prediction

Practicable Robust Stochastic Optimization under Divergence Measures

Semi-supervised Learning based on Distributionally Robust Optimization

Out-Of-Domain Unlabeled Data Improves Generalization

Mixed Semi-Supervised Generalized-Linear-Regression with applications to Deep-Learning and Interpolators

f-FERM: A Scalable Framework for Robust Fair Empirical Risk Minimization

Debiased Self-Training for Semi-Supervised Learning

On the KL-Divergence-based Robust Satisficing Model

Enhancing Adversarial Robustness in Low-Label Regime via Adaptively Weighted Regularization and Knowledge Distillation

Regularization for Adversarial Robust Learning

DiM: $f$-Divergence Minimization Guided Sharpness-Aware Optimization for Semi-supervised Medical Image Segmentation

A general semi-parametric elliptical distribution model for semi-supervised learning

Robust Offline Reinforcement Learning with Linearly Structured $f$-Divergence Regularization