Label Smoothing is Robustification against Model Misspecification

Ryoya Yamasaki,Toshiyuki Tanaka

2023-05-15

Abstract:Label smoothing (LS) adopts smoothed targets in classification tasks. For example, in binary classification, instead of the one-hot target $(1,0)^\top$ used in conventional logistic regression (LR), LR with LS (LSLR) uses the smoothed target $(1-\frac{\alpha}{2},\frac{\alpha}{2})^\top$ with a smoothing level $\alpha\in(0,1)$, which causes squeezing of values of the logit. Apart from the common regularization-based interpretation of LS that leads to an inconsistent probability estimator, we regard LSLR as modifying the loss function and consistent estimator for probability estimation. In order to study the significance of each of these two modifications by LSLR, we introduce a modified LSLR (MLSLR) that uses the same loss function as LSLR and the same consistent estimator as LR, while not squeezing the logits. For the loss function modification, we theoretically show that MLSLR with a larger smoothing level has lower efficiency with correctly-specified models, while it exhibits higher robustness against model misspecification than LR. Also, for the modification of the probability estimator, an experimental comparison between LSLR and MLSLR showed that this modification and squeezing of the logits in LSLR have negative effects on the probability estimation and classification performance. The understanding of the properties of LS provided by these comparisons allows us to propose MLSLR as an improvement over LSLR.

Machine Learning

What problem does this paper attempt to address?

The paper attempts to address the mechanism and effects of Label Smoothing (LS) in classification tasks. Specifically: 1. **Mechanism Understanding**: Although label smoothing has been widely adopted in many modern applications and has shown various benefits experimentally (such as preventing overconfidence, improving adversarial robustness, and enhancing generalization performance), its underlying mechanism has not been fully explored. 2. **Improvement Method**: The paper proposes a Modified Label Smoothing Logistic Regression (MLSLR), which uses the same loss function as LSLR but does not squeeze the logit values. By comparing LSLR and MLSLR, the paper finds that modifying the probability estimator and squeezing the logit values do not have a positive impact on probability estimation and classification performance, but rather a negative one. 3. **Theoretical Analysis**: For correctly specified models (i.e., cases where the data distribution conforms to the linear model assumption), MLSLR and LSQLR in the limit case (i.e., when the smoothing level $ \alpha $ approaches 1) show lower efficiency compared to standard Logistic Regression (LR), but exhibit higher robustness in the case of model mismatch. This robustness may explain the good performance of LS in practical applications.

Label Smoothing is Robustification against Model Misspecification

Towards Understanding Why Label Smoothing Degrades Selective Classification and How to Fix It

Towards Understanding Label Smoothing

Label Smoothing and Adversarial Robustness

Delving Deep into Label Smoothing.

Smooth Pseudo-Labeling

Learning label smoothing for text classification

Be Careful What You Smooth For: Label Smoothing Can Be a Privacy Shield but Also a Catalyst for Model Inversion Attacks

Label Smoothing for Text Mining.

Focus on the Target's Vocabulary: Masked Label Smoothing for Machine Translation.

Label Smoothing and Logit Squeezing: A Replacement for Adversarial Training?

Label Confusion Learning to Enhance Text Classification Models

Revisiting Label Smoothing Regularization with Knowledge Distillation

Rethinking Regularization with Random Label Smoothing

Label-Noise Robust Logistic Regression and Its Applications

Revisiting the Role of Label Smoothing in Enhanced Text Sentiment Classification

Regularized Label Relaxation Linear Regression

Rethinking Label Smoothing on Multi-Hop Question Answering

Loss factorization, weakly supervised learning and label noise robustness

Adaptive Label Smoothing for Out-of-Distribution Detection

Label Smoothing Improves Machine Unlearning