Output Layer Go First: Better Fine-tuning by Bridging the Gap with Pre-training

Tianxiong Xiao,Yuan Dong,Bin Dong
DOI: https://doi.org/10.1109/icnlp52887.2021.00030
2021-01-01
Abstract:Pre-trained Language Models (PLMs) have been attracting a lot attention in natural language processing field. A training paradigm of pre-training then fine-tuning is widely adopted for BERT-based architectures. However, due to the task gap between pre-training and fine-tuning, PLMs may suffer from knowledge forgetting during fine-tuning and thus lead to a worse performance than expected. We propose a new fine-tuning method in order to bridge this gap and improve the performance of PLMs. We firstly fine-tune the task-specific output layer of the PLMs while keeping the other layers` parameters constant, and then we fine-tine all layers of the PLMs as usual. Our approach is evaluated on multiple natural language understanding tasks and results to a significant improvement over a strong ELECTRA baseline. Specifically, it gains consistent improvements on tasks in GLUE and SQuAD2.0 with only a little additional computation.
What problem does this paper attempt to address?