Layer-wise Learning Rate Optimization for Task-Dependent Fine-Tuning of Pre-trained Models: An Evolutionary Approach
Yuxin Liu,Xindong Wu,Chenyang Bu,Shengwei Ji,Manzong Huang,Wenjian Luo,Jianxuan Shao
DOI: https://doi.org/10.1145/3689827
2024-08-24
ACM Transactions on Evolutionary Learning and Optimization
Abstract:The superior performance of large-scale pre-trained models, such as Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer (GPT), has received increasing attention in both academic and industrial research and has become one of the current research hotspots. A pre-trained model refers to a model trained on large-scale unlabeled data, whose purpose is to learn general language representation or features for fine-tuning or transfer learning in subsequent tasks. After pre-training is complete, a small amount of labeled data can be used to fine-tune the model for a specific task or domain. This two-stage method of “pre-training+fine-tuning” has achieved advanced results in natural language processing (NLP) tasks. Despite widespread adoption, existing fixed fine-tuning schemes that adapt well to one NLP task may perform inconsistently on other NLP tasks given that different tasks have different latent semantic structures. In this paper, we explore the effectiveness of automatic fine-tuning pattern search for layer-wise learning rates from an evolutionary optimization perspective. Our goal is to use evolutionary algorithms to search for better task-dependent fine-tuning patterns for specific NLP tasks than typical fixed fine-tuning patterns. Experimental results on two real-world language benchmarks and three advanced pre-training language models show the effectiveness and generality of the proposed framework.
Computer Science,Engineering