FreeLM: Fine-Tuning-Free Language Model

Xiang Li,Xin Jiang,Xuying Meng,Aixin Sun,Yequan Wang
2023-05-03
Abstract:Pre-trained language models (PLMs) have achieved remarkable success in NLP tasks. Despite the great success, mainstream solutions largely follow the pre-training then finetuning paradigm, which brings in both high deployment costs and low training efficiency. Nevertheless, fine-tuning on a specific task is essential because PLMs are only pre-trained with language signal from large raw data. In this paper, we propose a novel fine-tuning-free strategy for language models, to consider both language signal and teacher signal. Teacher signal is an abstraction of a battery of downstream tasks, provided in a unified proposition format. Trained with both language and strong task-aware teacher signals in an interactive manner, our FreeLM model demonstrates strong generalization and robustness. FreeLM outperforms large models e.g., GPT-3 and InstructGPT, on a range of language understanding tasks in experiments. FreeLM is much smaller with 0.3B parameters, compared to 175B in these models.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The main problem this paper attempts to address is the high deployment cost and low training efficiency of current pre-trained language models (PLMs) in practical applications. Although these models have achieved significant success in natural language processing tasks, they typically require a large amount of data and computational resources for pre-training, followed by fine-tuning for specific tasks. This not only increases the deployment cost but also reduces training efficiency. Moreover, the performance of existing large language models on specific tasks is often unsatisfactory without fine-tuning. To address these issues, the paper proposes a new method called FreeLM, which is a language model that does not require fine-tuning. FreeLM aims to improve the model's generalization ability and robustness while reducing training and deployment costs by combining language signals and teacher signals (i.e., abstract representations of a series of downstream tasks) for interactive training. Specifically, FreeLM alternates between learning from raw language data and unified data based on multiple predefined tasks during training, enabling good performance on various language understanding tasks without fine-tuning. Experimental results show that FreeLM outperforms large models, including GPT-3 and InstructGPT, on multiple language understanding tasks, with only 30 million parameters, which is significantly smaller than these large models.