Abstract:Parameter-efficient tuning (PET) methods can effectively drive extremely large pre-trained language models (PLMs) by training only minimal parameters. Different PET methods utilize different manually designed tunable modules. In small PLMs, there are usually noticeable performance differences among PET methods. Nevertheless, as the model scale increases, the performance differences become marginal. Hence, we hypothesize that model scaling mitigates the impact of design differences on PET methods. To investigate this hypothesis, we introduce a more flexible PET method called Arbitrary PET (APET) method. The APET method is compatible with a tunable module, which consists of any number of parameters distributed in arbitrary positions. Then, we utilize it and conduct experiments on 11 NLP tasks across 3 representative PLMs. Our investigations reveal that model scaling (1) mitigates the effects of the positions of tunable parameters on performance, and (2) enables tuning methods to achieve performance comparable to full-parameter fine-tuning by optimizing fewer tunable parameters. Intriguingly, we also observe that tuning methods optimize the similar number of tunable parameters to exceed random guess performance on different tasks. We collectively discuss this phenomenon and the two aforementioned findings from an optimization perspective to understand the underlying mechanisms. These conclusions enhance our understanding of the impact of model scaling on PET and assist in designing more effective and efficient PET methods for PLMs of different scales. The source code can be obtained from this GitHub repository: \url{https://github.com/yushengsu-thu/PET_Scaling}.

Enhancing Scalability of Pre-trained Language Models Via Efficient Parameter Sharing.

Scaling Pre-trained Language Models to Deeper Via Parameter-efficient Architecture

Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models.

Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models

bert2BERT: Towards Reusable Pretrained Language Models

Small Pre-trained Language Models Can Be Fine-tuned As Large Models Via Over-Parameterization.

ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation

Arbitrary Few Parameters Are Good Enough for Adapting Large-scale Pre-trained Language Models

GreenPLM: Cross-Lingual Transfer of Monolingual Pre-Trained Language Models at Almost No Cost

Exploring Extreme Parameter Compression for Pre-trained Language Models

MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

Parameter-Efficient Adapter Based on Pre-trained Models for Speech Translation

Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment

Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale

A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models

Understanding Parameter Sharing in Transformers

Parameter-efficient Tuning for Large Language Model Without Calculating Its Gradients

Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

Increasing Model Capacity for Free: A Simple Strategy for Parameter Efficient Fine-tuning

CPM-2: Large-scale Cost-effective Pre-trained Language Models

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism