Abstract:Learning from noisy labels is a challenge that arises in many real-world applications where training data can contain incorrect or corrupted labels. When fine-tuning language models with noisy labels, models can easily overfit the label noise, leading to decreased performance. Most existing methods for learning from noisy labels use static input features for denoising, but these methods are limited by the information they can provide on true label distributions and can result in biased or incorrect predictions. In this work, we propose the Dynamics-Enhanced Generative Model (DyGen), which uses dynamic patterns in the embedding space during the fine-tuning process of language models to improve noisy label predictions. DyGen uses the variational auto-encoding framework to infer the posterior distributions of true labels from noisy labels and training dynamics. Additionally, a co-regularization mechanism is used to minimize the impact of potentially noisy labels and priors. DyGen demonstrates an average accuracy improvement of 3.10% on two synthetic noise datasets and 1.48% on three real-world noise datasets compared to the previous state-of-the-art. Extensive experiments and analyses show the effectiveness of each component in DyGen. Our code is available for reproducibility on GitHub.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the problem that pre - trained language models (PLMs) are prone to over - fitting noisy labels when trained with data having noisy labels. Specifically, when fine - tuning a language model with noisy labels, the model may over - adapt to these incorrect labels, resulting in a decline in performance. Existing methods usually rely on static input features for denoising, but these methods provide limited information about the true label distribution and may lead to bias or incorrect predictions. ### Solutions To solve the above problems, the authors propose a method based on the Dynamic Augmented Generation Model (DyGen). DyGen utilizes the dynamic patterns in the embedding space during the fine - tuning process to improve the prediction of noisy labels. Specifically, DyGen uses the variational auto - encoder framework to infer the posterior distribution of the true labels from the noisy labels and training dynamics, and minimizes the influence of potential noisy labels and priors through a co - regularization mechanism. ### Main contributions 1. **Discovering dynamic training patterns**: The authors find that during the fine - tuning process of pre - trained language models, the dynamic behaviors of noisy samples and clean samples in the embedding space are different. Using this finding, a denoising fine - tuning method is proposed. 2. **Designing a generation model**: The model is modeled by reconstructing the noisy label ˆ𝑦, while using the training dynamic w to infer the posterior distribution of the true label 𝑦. 3. **Co - regularization mechanism**: The generation models of multiple branches are mutually regularized to improve the robustness of inferring the true label. 4. **Experimental verification**: Extensive experiments have been carried out on a variety of synthetic and real - world noisy data sets, and the results show that DyGen outperforms the existing state - of - the - art methods, especially performing well under extreme noise ratios. ### Formula summary - **Joint distribution decomposition**: \[ p(y, \hat{y} | w) = p(y | w) p(\hat{y} | y, w) \] - **Generating process**: \[ y \sim \text{Dirichlet}(\alpha_w), \quad \hat{y} \sim \text{Multi}(\pi_w, y) \] - **Variational lower bound (ELBO)**: \[ \text{ELBO} = \sum_{k = 1}^c \left( \hat{y}_k \log \hat{y}_k^*+ (1 - \hat{y}_k) \log (1 - \hat{y}_k^*) \right)- \log \Gamma(\alpha_k^x)+ \log \Gamma(\hat{\alpha}^x, \hat{y})- (\hat{\alpha}_k^x, \hat{y} - \alpha_k^x) \psi(\hat{\alpha}_k^x, \hat{y}) \] - **Posterior distribution**: \[ H = q_\phi(y | \hat{y}, w)=\frac{\hat{\alpha}_{y, \hat{y}} - 1}{\sum_{y = 1}^c \hat{\alpha}_{y, \hat{y}} - c} \] - **Final prediction**: \[ p(y | x)\propto\sum_{k = 1}^c q_\phi(y | \hat{y} = k, w) p(\hat{y} = k | x) \] ### Conclusion DyGen effectively solves the over - fitting problem of pre - trained language models on noisy - label data by using training dynamic patterns and generation models, and improves the performance and robustness of the model under various noise conditions.

DyGen: Learning from Noisy Labels via Dynamics-Enhanced Generative Modeling

Learning with Noisy Labels Via Self-supervised Adversarial Noisy Masking

Learning from Noisy Labels with Decoupled Meta Label Purifier

Dynamics-Aware Loss for Learning with Label Noise

Partial Label Supervision for Agnostic Generative Noisy Label Learning

Dynamic training for handling textual label noise

Learning Discriminative Dynamics with Label Corruption for Noisy Label Detection

Disentangling Factors of Variation in Deep Representations Using Adversarial Training.

Decoding class dynamics in learning with noisy labels

Label Denoising through Cross-Model Agreement

Multi-level Generative Models for Partial Label Learning with Non-random Label Noise

Instance-Dependent Noisy Label Learning via Graphical Modelling

Label-Noise Robust Diffusion Models

DAT: Training Deep Networks Robust to Label-Noise by Matching the Feature Distributions

Dimensionality-Driven Learning with Noisy Labels.

Sample Prior Guided Robust Model Learning to Suppress Noisy Labels

Unleashing the Potential of Regularization Strategies in Learning with Noisy Labels

Unsupervised speech enhancement with deep dynamical generative speech and noise models

Generative Denoise Distillation: Simple Stochastic Noises Induce Efficient Knowledge Transfer for Dense Prediction

Confidence Adaptive Regularization for Deep Learning with Noisy Labels