Abstract:With the ever-increasing complexity of large-scale pre-trained models coupled with a shortage of labeled data for downstream training, transfer learning has become the primary approach in many fields, including natural language processing, computer vision, and multi-modal learning. Despite recent progress, the fine-tuning process for large-scale pre-trained models in vision still mostly relies on trial and error. This work investigates the relationship between neural collapse (NC) and transfer learning for classification problems. NC is an intriguing while prevalent phenomenon that has been recently discovered in terms of the final-layer features and linear classifiers of trained neural networks. Specifically, during the terminal phase of training, NC implies that the variability of the features within each class diminishes to zero, while the means of features between classes are maximally and equally distanced. In this work, we examine the NC attributes of pre-trained models on both downstream and source data for transfer learning, and we find strong correlation between feature collapse and downstream performance. In particular, we discovered a systematic pattern that emerges when linear probing pre-trained models on downstream training data: the more feature collapse of pre-trained models on downstream training data, the higher the transfer accuracy. Additionally, we also studied the relationship between NC and transfer accuracy on the source data. Moreover, these findings allow us to develop a principled, parameter-efficient fine-tuning method that employs skip-connection to induce the last-layer feature collapse on downstream data. Our proposed fine-tuning methods deliver good performances while reducing fine-tuning parameters by at least 90% and mitigating overfitting in situations especially when the downstream data is scarce.

How transfer learning impacts linguistic knowledge in deep NLP models?

Identifying the Limits of Cross-Domain Knowledge Transfer for Pretrained Models

Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models

On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets

Discovering Salient Neurons in Deep NLP Models

Reverse Transfer Learning: Can Word Embeddings Trained for Different NLP Tasks Improve Neural Language Models?

Exploring and Predicting Transferability across NLP Tasks

Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies

Languages You Know Influence Those You Learn: Impact of Language Characteristics on Multi-Lingual Text-to-Text Transfer

Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting

Rethinking Two Consensuses of the Transferability in Deep Learning

Towards Understanding the Transferability of Deep Representations

Feature Reuse and Scaling: Understanding Transfer Learning with Protein Language Models

Transfer Learning for Speech and Language Processing

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

How do languages influence each other? Studying cross-lingual data sharing during LM fine-tuning

Commonsense Knowledge Transfer for Pre-trained Language Models

Knowledge Distillation Transfer Sets and their Impact on Downstream NLU Tasks

A new computationally efficient method to tune BERT networks – transfer learning

Understanding and Improving Transfer Learning of Deep Models via Neural Collapse

How Transferable Are Neural Networks in NLP Applications?