Parameter-Efficient Fine-Tuning With Adapters

Keyu Chen,Yuan Pang,Zi Yang
2024-05-09
Abstract:In the arena of language model fine-tuning, the traditional approaches, such as Domain-Adaptive Pretraining (DAPT) and Task-Adaptive Pretraining (TAPT), although effective, but computational intensive. This research introduces a novel adaptation method utilizing the UniPELT framework as a base and added a PromptTuning Layer, which significantly reduces the number of trainable parameters while maintaining competitive performance across various benchmarks. Our method employs adapters, which enable efficient transfer of pretrained models to new tasks with minimal retraining of the base model parameters. We evaluate our approach using three diverse datasets: the GLUE benchmark, a domain-specific dataset comprising four distinct areas, and the Stanford Question Answering Dataset 1.1 (SQuAD). Our results demonstrate that our customized adapter-based method achieves performance comparable to full model fine-tuning, DAPT+TAPT and UniPELT strategies while requiring fewer or equivalent amount of parameters. This parameter efficiency not only alleviates the computational burden but also expedites the adaptation process. The study underlines the potential of adapters in achieving high performance with significantly reduced resource consumption, suggesting a promising direction for future research in parameter-efficient fine-tuning.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the problem of reducing the number of parameters during the fine-tuning process of language models to lower computational resource consumption while maintaining performance comparable to full model fine-tuning. Specifically, traditional fine-tuning methods such as Domain-Adaptive Pre-Training (DAPT) and Task-Adaptive Pre-Training (TAPT) are effective but computationally expensive. This paper proposes a new method based on adapters, by adding a Prompt Tuning Layer on top of the UniPELT framework, significantly reducing the number of trainable parameters while maintaining competitive performance across multiple benchmarks. ### Main Research Objectives: 1. **Reduce the number of parameters**: Achieve effective transfer of the pre-trained model using adapters without the need to retrain a large number of base model parameters. 2. **Maintain performance**: Ensure that the model's performance on various tasks remains comparable to full model fine-tuning, DAPT+TAPT, and UniPELT strategies while reducing the number of parameters. 3. **Improve efficiency**: Reduce computational resource consumption and speed up the adaptation process. ### Research Methods: - **Datasets**: Evaluated using three different datasets: GLUE benchmark, domain-specific datasets (including four domains: biomedical, computer science, news, and reviews), and the Stanford Question Answering Dataset (SQuAD). - **Model Selection**: Chose the RoBERTa-Base model as the base model and applied the UniPELT framework on it. - **Experimental Setup**: Set up different adapter structures, including: - Basic UniPELT framework - UniPELT with added Prompt Tuning Layer - UniPELT replacing LoRA adapters with IA3 - Stacked three-layer UniPELT ### Experimental Results: - **GLUE Benchmark**: The proposed adapter method achieved performance comparable to or close to full model fine-tuning on multiple tasks while reducing the number of parameters. - **Domain-Specific Datasets**: In domains such as biomedical and computer science, the adapter method showed significant performance improvement in cases of low vocabulary overlap. - **SQuAD Dataset**: The adapter method also performed well in text generation tasks, although slightly inferior to full model fine-tuning in some tasks. ### Conclusion: By introducing the adapter method, this paper successfully maintained high model performance while reducing the number of parameters, particularly excelling in domain-specific datasets. This provides a new direction for future research on parameter-efficient fine-tuning.