On the Generalization Ability of Unsupervised Pretraining

Yuyang Deng,Junyuan Hong,Jiayu Zhou,Mehrdad Mahdavi

2024-03-12

Abstract:Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization. However, a rigorous understanding of how the representation function learned on an unlabeled dataset affects the generalization of the fine-tuned model is lacking. Existing theoretical research does not adequately account for the heterogeneity of the distribution and tasks in pre-training and fine-tuning stage. To bridge this gap, this paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase, ultimately affecting the generalization capabilities of the fine-tuned model on downstream tasks. We apply our theoretical framework to analyze generalization bound of two distinct scenarios: Context Encoder pre-training with deep neural networks and Masked Autoencoder pre-training with deep transformers, followed by fine-tuning on a binary classification task. Finally, inspired by our findings, we propose a novel regularization method during pre-training to further enhances the generalization of fine-tuned model. Overall, our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.

Machine Learning

What problem does this paper attempt to address?

This paper discusses the problem of improving the generalization ability of models through unsupervised pre-training. Existing theories fail to fully explain the generalization performance of pre-training and fine-tuning stages under different tasks and data distribution heterogeneity. The paper proposes a new theoretical framework that quantifies the transferability of knowledge learned in the pre-training stage to the fine-tuning stage, which affects the generalization ability of the fine-tuned model. The authors analyze two scenarios: context encoding pre-training in deep neural networks and masked autoencoder pre-training in deep Transformer architecture, followed by binary classification fine-tuning. The paper also proposes a regularization method to enhance the generalization ability of the fine-tuned model and provides a deeper understanding of the pre-training and fine-tuning paradigms.

On the Generalization Ability of Unsupervised Pretraining

Generalization algorithm of multimodal pre-training model based on graph-text self-supervised training

Why Fine-grained Labels in Pretraining Benefit Generalization?

An Analysis of Unsupervised Pre-training in Light of Recent Advances

Towards Unsupervised Domain Generalization

On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness

Bi-tuning: Efficient Transfer from Pre-trained Models

Revisiting the Transferability of Supervised Pretraining: an MLP Perspective

Bi-tuning of Pre-trained Representations

Rethinking Supervised Pre-training for Better Downstream Transferring

Boost Supervised Pretraining for Visual Transfer Learning: Implications of Self-Supervised Contrastive Representation Learning.

Can Fine-tuning Pre-trained Models Lead to Perfect NLP? A Study of the Generalizability of Relation Extraction.

Why does the unsupervised pretraining encourage moderate-sparseness?

Statistical-mechanical analysis of pre-training and fine tuning in deep learning

A separability-based approach to quantifying generalization: which layer is best?

Why Unsupervised Deep Networks Generalize

Improved Fine-Tuning by Better Leveraging Pre-Training Data

Research On Pre-Training Method and Generalization Ability of Big Data Recognition Model of the Internet of Things

Unveiling the Generalization Power of Fine-Tuned Large Language Models

Is Large-Scale Pretraining the Secret to Good Domain Generalization?

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization