On the Generalization Ability of Unsupervised Pretraining

Yuyang Deng,Junyuan Hong,Jiayu Zhou,Mehrdad Mahdavi
2024-03-12
Abstract:Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization. However, a rigorous understanding of how the representation function learned on an unlabeled dataset affects the generalization of the fine-tuned model is lacking. Existing theoretical research does not adequately account for the heterogeneity of the distribution and tasks in pre-training and fine-tuning stage. To bridge this gap, this paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase, ultimately affecting the generalization capabilities of the fine-tuned model on downstream tasks. We apply our theoretical framework to analyze generalization bound of two distinct scenarios: Context Encoder pre-training with deep neural networks and Masked Autoencoder pre-training with deep transformers, followed by fine-tuning on a binary classification task. Finally, inspired by our findings, we propose a novel regularization method during pre-training to further enhances the generalization of fine-tuned model. Overall, our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
Machine Learning
What problem does this paper attempt to address?
This paper discusses the problem of improving the generalization ability of models through unsupervised pre-training. Existing theories fail to fully explain the generalization performance of pre-training and fine-tuning stages under different tasks and data distribution heterogeneity. The paper proposes a new theoretical framework that quantifies the transferability of knowledge learned in the pre-training stage to the fine-tuning stage, which affects the generalization ability of the fine-tuned model. The authors analyze two scenarios: context encoding pre-training in deep neural networks and masked autoencoder pre-training in deep Transformer architecture, followed by binary classification fine-tuning. The paper also proposes a regularization method to enhance the generalization ability of the fine-tuned model and provides a deeper understanding of the pre-training and fine-tuning paradigms.