FreeLM: Fine-Tuning-Free Language Model

Xiang Li,Xin Jiang,Xuying Meng,Aixin Sun,Yequan Wang

2023-05-03

Abstract:Pre-trained language models (PLMs) have achieved remarkable success in NLP tasks. Despite the great success, mainstream solutions largely follow the pre-training then finetuning paradigm, which brings in both high deployment costs and low training efficiency. Nevertheless, fine-tuning on a specific task is essential because PLMs are only pre-trained with language signal from large raw data. In this paper, we propose a novel fine-tuning-free strategy for language models, to consider both language signal and teacher signal. Teacher signal is an abstraction of a battery of downstream tasks, provided in a unified proposition format. Trained with both language and strong task-aware teacher signals in an interactive manner, our FreeLM model demonstrates strong generalization and robustness. FreeLM outperforms large models e.g., GPT-3 and InstructGPT, on a range of language understanding tasks in experiments. FreeLM is much smaller with 0.3B parameters, compared to 175B in these models.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The main problem this paper attempts to address is the high deployment cost and low training efficiency of current pre-trained language models (PLMs) in practical applications. Although these models have achieved significant success in natural language processing tasks, they typically require a large amount of data and computational resources for pre-training, followed by fine-tuning for specific tasks. This not only increases the deployment cost but also reduces training efficiency. Moreover, the performance of existing large language models on specific tasks is often unsatisfactory without fine-tuning. To address these issues, the paper proposes a new method called FreeLM, which is a language model that does not require fine-tuning. FreeLM aims to improve the model's generalization ability and robustness while reducing training and deployment costs by combining language signals and teacher signals (i.e., abstract representations of a series of downstream tasks) for interactive training. Specifically, FreeLM alternates between learning from raw language data and unified data based on multiple predefined tasks during training, enabling good performance on various language understanding tasks without fine-tuning. Experimental results show that FreeLM outperforms large models, including GPT-3 and InstructGPT, on multiple language understanding tasks, with only 30 million parameters, which is significantly smaller than these large models.

FreeLM: Fine-Tuning-Free Language Model

A Semantic-based Layer Freezing Approach to Efficient Fine-Tuning of Language Models

TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise

LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models

Parameter-efficient fine-tuning of large-scale pre-trained language models

Fine-Tuning Large Language Models in Education

Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment

Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model

LMTuner: An user-friendly and highly-integrable Training Framework for fine-tuning Large Language Models

An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models

FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models

Generalizable and Stable Finetuning of Pretrained Language Models on Low-Resource Texts

LIMA: Less Is More for Alignment

CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models

Fine-grained LLM Agent: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback

Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning

Fine-tuning protein language models boosts predictions across diverse tasks

Crafting Efficient Fine-Tuning Strategies for Large Language Models

Learning Global Controller in Latent Space for Parameter-Efficient Fine-Tuning

From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data