Abstract:Learning from noisy labels is a challenge that arises in many real-world applications where training data can contain incorrect or corrupted labels. When fine-tuning language models with noisy labels, models can easily overfit the label noise, leading to decreased performance. Most existing methods for learning from noisy labels use static input features for denoising, but these methods are limited by the information they can provide on true label distributions and can result in biased or incorrect predictions. In this work, we propose the Dynamics-Enhanced Generative Model (DyGen), which uses dynamic patterns in the embedding space during the fine-tuning process of language models to improve noisy label predictions. DyGen uses the variational auto-encoding framework to infer the posterior distributions of true labels from noisy labels and training dynamics. Additionally, a co-regularization mechanism is used to minimize the impact of potentially noisy labels and priors. DyGen demonstrates an average accuracy improvement of 3.10% on two synthetic noise datasets and 1.48% on three real-world noise datasets compared to the previous state-of-the-art. Extensive experiments and analyses show the effectiveness of each component in DyGen. Our code is available for reproducibility on GitHub.
What problem does this paper attempt to address?
### Problems the paper attempts to solve
This paper aims to solve the problem that pre - trained language models (PLMs) are prone to over - fitting noisy labels when trained with data having noisy labels. Specifically, when fine - tuning a language model with noisy labels, the model may over - adapt to these incorrect labels, resulting in a decline in performance. Existing methods usually rely on static input features for denoising, but these methods provide limited information about the true label distribution and may lead to bias or incorrect predictions.
### Solutions
To solve the above problems, the authors propose a method based on the Dynamic Augmented Generation Model (DyGen). DyGen utilizes the dynamic patterns in the embedding space during the fine - tuning process to improve the prediction of noisy labels. Specifically, DyGen uses the variational auto - encoder framework to infer the posterior distribution of the true labels from the noisy labels and training dynamics, and minimizes the influence of potential noisy labels and priors through a co - regularization mechanism.
### Main contributions
1. **Discovering dynamic training patterns**: The authors find that during the fine - tuning process of pre - trained language models, the dynamic behaviors of noisy samples and clean samples in the embedding space are different. Using this finding, a denoising fine - tuning method is proposed.
2. **Designing a generation model**: The model is modeled by reconstructing the noisy label ˆ𝑦, while using the training dynamic w to infer the posterior distribution of the true label 𝑦.
3. **Co - regularization mechanism**: The generation models of multiple branches are mutually regularized to improve the robustness of inferring the true label.
4. **Experimental verification**: Extensive experiments have been carried out on a variety of synthetic and real - world noisy data sets, and the results show that DyGen outperforms the existing state - of - the - art methods, especially performing well under extreme noise ratios.
### Formula summary
- **Joint distribution decomposition**:
\[
p(y, \hat{y} | w) = p(y | w) p(\hat{y} | y, w)
\]
- **Generating process**:
\[
y \sim \text{Dirichlet}(\alpha_w), \quad \hat{y} \sim \text{Multi}(\pi_w, y)
\]
- **Variational lower bound (ELBO)**:
\[
\text{ELBO} = \sum_{k = 1}^c \left( \hat{y}_k \log \hat{y}_k^*+ (1 - \hat{y}_k) \log (1 - \hat{y}_k^*) \right)- \log \Gamma(\alpha_k^x)+ \log \Gamma(\hat{\alpha}^x, \hat{y})- (\hat{\alpha}_k^x, \hat{y} - \alpha_k^x) \psi(\hat{\alpha}_k^x, \hat{y})
\]
- **Posterior distribution**:
\[
H = q_\phi(y | \hat{y}, w)=\frac{\hat{\alpha}_{y, \hat{y}} - 1}{\sum_{y = 1}^c \hat{\alpha}_{y, \hat{y}} - c}
\]
- **Final prediction**:
\[
p(y | x)\propto\sum_{k = 1}^c q_\phi(y | \hat{y} = k, w) p(\hat{y} = k | x)
\]
### Conclusion
DyGen effectively solves the over - fitting problem of pre - trained language models on noisy - label data by using training dynamic patterns and generation models, and improves the performance and robustness of the model under various noise conditions.