Abstract:Pre-trained models have brought significant improvements to many NLP tasks and have been extensively analyzed. But little is known about the effect of fine-tuning on specific tasks. Intuitively, people may agree that a pre-trained model already learns semantic representations of words (e.g. synonyms are closer to each other) and fine-tuning further improves its capabilities which require more complicated reasoning (e.g. coreference resolution, entity boundary detection, etc). However, how to verify these arguments analytically and quantitatively is a challenging task and there are few works focus on this topic. In this paper, inspired by the observation that most probing tasks involve identifying matched pairs of phrases (e.g. coreference requires matching an entity and a pronoun), we propose a pairwise probe to understand BERT fine-tuning on the machine reading comprehension (MRC) task. Specifically, we identify five phenomena in MRC. According to pairwise probing tasks, we compare the performance of each layer's hidden representation of pre-trained and fine-tuned BERT. The proposed pairwise probe alleviates the problem of distraction from inaccurate model training and makes a robust and quantitative comparison. Our experimental analysis leads to highly confident conclusions: (1) Fine-tuning has little effect on the fundamental and low-level information and general semantic tasks. (2) For specific abilities required for downstream tasks, fine-tuned BERT is better than pre-trained BERT and such gaps are obvious after the fifth layer.

A fine-tuning approach research of pre-trained model with two stage

Can Fine-tuning Pre-trained Models Lead to Perfect NLP? A Study of the Generalizability of Relation Extraction.

How to Fine-Tune BERT for Text Classification?

A Closer Look at How Fine-tuning Changes BERT

Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation

KNN-BERT: Fine-Tuning Pre-Trained Models with KNN Classifier

Revisiting K-Nn for Fine-Tuning Pre-trained Language Models

Layer-wise Learning Rate Optimization for Task-Dependent Fine-Tuning of Pre-trained Models: An Evolutionary Approach

Single task fine-tune BERT for text classification

Empirical Analysis of Efficient Fine-Tuning Methods for Large Pre-Trained Language Models

Fine-Tuning BERT for Sentiment Analysis of Vietnamese Reviews

HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation

LayerNorm: A key component in parameter-efficient fine-tuning

Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting

A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension

Fine-tuning large neural language models for biomedical natural language processing

Two-Stage Fine-Tuning: A Novel Strategy for Learning Class-Imbalanced Data

Optimization Techniques for Sentiment Analysis Based on LLM (GPT-3)

Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

BERTer: The Efficient One