Learning with Noisy Foundation Models
Hao Chen,Jindong Wang,Zihan Wang,Ran Tao,Hongxin Wei,Xing Xie,Masashi Sugiyama,Bhiksha Raj
2024-01-01
Abstract:Foundation models are usually pre-trained on large-scale datasets and then
adapted to downstream tasks through tuning. However, the large-scale
pre-training datasets, often inaccessible or too expensive to handle, can
contain label noise that may adversely affect the generalization of the model
and pose unexpected risks. This paper stands out as the first work to
comprehensively understand and analyze the nature of noise in pre-training
datasets and then effectively mitigate its impacts on downstream tasks.
Specifically, through extensive experiments of fully-supervised and image-text
contrastive pre-training on synthetic noisy ImageNet-1K, YFCC15M, and CC12M
datasets, we demonstrate that, while slight noise in pre-training can benefit
in-domain (ID) performance, where the training and testing data share a similar
distribution, it always deteriorates out-of-domain (OOD) performance, where
training and testing distributions are significantly different. These
observations are agnostic to scales of pre-training datasets, pre-training
noise types, model architectures, pre-training objectives, downstream tuning
methods, and downstream applications. We empirically ascertain that the reason
behind this is that the pre-training noise shapes the feature space
differently. We then propose a tuning method (NMTune) to affine the feature
space to mitigate the malignant effect of noise and improve generalization,
which is applicable in both parameter-efficient and black-box tuning manners.
We additionally conduct extensive experiments on popular vision and language
models, including APIs, which are supervised and self-supervised pre-trained on
realistic noisy data for evaluation. Our analysis and results demonstrate the
importance of this novel and fundamental research direction, which we term as
Noisy Model Learning.