Abstract:With the ever-increasing complexity of large-scale pre-trained models coupled with a shortage of labeled data for downstream training, transfer learning has become the primary approach in many fields, including natural language processing, computer vision, and multi-modal learning. Despite recent progress, the fine-tuning process for large-scale pre-trained models in vision still mostly relies on trial and error. This work investigates the relationship between neural collapse (NC) and transfer learning for classification problems. NC is an intriguing while prevalent phenomenon that has been recently discovered in terms of the final-layer features and linear classifiers of trained neural networks. Specifically, during the terminal phase of training, NC implies that the variability of the features within each class diminishes to zero, while the means of features between classes are maximally and equally distanced. In this work, we examine the NC attributes of pre-trained models on both downstream and source data for transfer learning, and we find strong correlation between feature collapse and downstream performance. In particular, we discovered a systematic pattern that emerges when linear probing pre-trained models on downstream training data: the more feature collapse of pre-trained models on downstream training data, the higher the transfer accuracy. Additionally, we also studied the relationship between NC and transfer accuracy on the source data. Moreover, these findings allow us to develop a principled, parameter-efficient fine-tuning method that employs skip-connection to induce the last-layer feature collapse on downstream data. Our proposed fine-tuning methods deliver good performances while reducing fine-tuning parameters by at least 90% and mitigating overfitting in situations especially when the downstream data is scarce.

Transferring Pretrained Networks to Small Data Via Category Decorrelation.

Co-Tuning for Transfer Learning.

Catastrophic Forgetting Meets Negative Transfer: Batch Spectral Shrinkage for Safe Transfer Learning.

Transfer learning in computer vision tasks: Remember where you come from

Deep Inhomogeneous Regularization for Transfer Learning

Fine-grained Category Discovery under Coarse-grained Supervision with Hierarchical Weighted Self-contrastive Learning

Towards Inadequately Pre-trained Models in Transfer Learning

Effective Domain Knowledge Transfer with Soft Fine-tuning

Overcoming Catastrophic Forgetting for Fine-Tuning Pre-trained GANs

Towards Making Deep Transfer Learning Never Hurt

Studying Catastrophic Forgetting in Neural Ranking Models

Gated Transfer Network for Transfer Learning

Understanding and Improving Transfer Learning of Deep Models via Neural Collapse

Alleviating Representational Shift for Continual Fine-tuning.

An Efficient Strategy for Catastrophic Forgetting Reduction in Incremental Learning

CARTL: Cooperative Adversarially-Robust Transfer Learning

On consequences of finetuning on data with highly discriminative features

Class-Incremental Learning Based on Big Dataset Pre-Trained Models

Domain Perceptive-Pruning and Fine-Tuning the Pre-Trained Model for Heterogeneous Transfer Learning in Cross Domain Prediction

Transferable Normalization: Towards Improving Transferability of Deep Neural Networks

Scaling Laws for Transfer