Abstract:Foundation models are usually pre-trained on large-scale datasets and then adapted to downstream tasks through tuning. However, the large-scale pre-training datasets, often inaccessible or too expensive to handle, can contain label noise that may adversely affect the generalization of the model and pose unexpected risks. This paper stands out as the first work to comprehensively understand and analyze the nature of noise in pre-training datasets and then effectively mitigate its impacts on downstream tasks. Specifically, through extensive experiments of fully-supervised and image-text contrastive pre-training on synthetic noisy ImageNet-1K, YFCC15M, and CC12M datasets, we demonstrate that, while slight noise in pre-training can benefit in-domain (ID) performance, where the training and testing data share a similar distribution, it always deteriorates out-of-domain (OOD) performance, where training and testing distributions are significantly different. These observations are agnostic to scales of pre-training datasets, pre-training noise types, model architectures, pre-training objectives, downstream tuning methods, and downstream applications. We empirically ascertain that the reason behind this is that the pre-training noise shapes the feature space differently. We then propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization, which is applicable in both parameter-efficient and black-box tuning manners. We additionally conduct extensive experiments on popular vision and language models, including APIs, which are supervised and self-supervised pre-trained on realistic noisy data for evaluation. Our analysis and results demonstrate the importance of this novel and fundamental research direction, which we term as Noisy Model Learning.

Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models

On the Edge of Benign Overfitting: Label Noise and Overparameterization Level

Benign Overfitting in Two-layer Convolutional Neural Networks

Noise is the Fatal Poison: A Noise-aware Network for Noisy Dataset Classification

How benign is benign overfitting?

Benign Overfitting for Two-layer ReLU Convolutional Neural Networks

Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks

Benign Overfitting in Single-Head Attention

The Implicit Bias of Benign Overfitting

The Surprising Harmfulness of Benign Overfitting for Adversarial Robustness

Towards an Understanding of Benign Overfitting in Neural Networks

Analyze the Robustness of Classifiers under Label Noise

Learning with Noisy Labels Via Self-supervised Adversarial Noisy Masking

Learning with Noisy Foundation Models

Benign overfitting in linear regression

Bayesian statistics guided label refurbishment mechanism: Mitigating label noise in medical image classification

Two Wrongs Don't Make a Right: Combating Confirmation Bias in Learning with Label Noise.

Label Noise: Correcting the Forward-Correction

Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization

Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data

Error-Bounded Correction of Noisy Labels