Abstract:Parameter-Efficient Fine-Tuning (PEFT) has become the standard for customising Foundation Models (FMs) to user-specific downstream tasks. However, typical PEFT methods require storing multiple task-specific adapters, creating scalability issues as these adapters must be housed and run at the FM server. Traditional prompt tuning offers a potential solution by customising them through task-specific input prefixes, but it under-performs compared to other PEFT methods like LoRA. To address this gap, we propose Low-Rank Prompt Adaptation (LoPA), a prompt-tuning-based approach that performs on par with state-of-the-art PEFT methods and full fine-tuning while being more parameter-efficient and not requiring a server-based adapter. LoPA generates soft prompts by balancing between sharing task-specific information across instances and customization for each instance. It uses a low-rank decomposition of the soft-prompt component encoded for each instance to achieve parameter efficiency. We provide a comprehensive evaluation on multiple natural language understanding and code generation and understanding tasks across a wide range of foundation models with varying sizes.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the scalability and performance issues faced by Parameter-Efficient Fine-Tuning (PEFT) methods when customizing Foundation Models (FMs). Specifically: 1. **Scalability Issue**: Traditional PEFT methods require storing multiple task-specific adapters, which leads to increased storage and operational costs on servers, especially when dealing with a large number of user-specific tasks. 2. **Performance Issue**: Although traditional prompt tuning methods have the advantage of high parameter efficiency, their performance is often inferior to other PEFT methods, such as Low-Rank Adaptation (LoRA). To tackle these issues, the authors propose the Low-Rank Prompt Adaptation (LoPA) method. LoPA generates soft prompts by balancing task-specific and instance-specific information, achieving performance comparable to state-of-the-art PEFT methods while maintaining parameter efficiency. Specifically, LoPA uses low-rank decomposition to reduce the number of parameters and combines task-specific and instance-specific information through a gating function. ### Main Contributions 1. **Proposing LoPA**: A parameter-efficient and high-performance prompt tuning strategy. 2. **Validating Effectiveness**: Extensive experiments on various natural language understanding and code generation tasks validate the effectiveness of LoPA. The results show that LoPA outperforms existing prompt tuning methods on multiple tasks and, in some cases, even surpasses the performance of full fine-tuning and LoRA. ### Experimental Results - **Natural Language Understanding Tasks**: On six benchmark tasks of the GLUE dataset, LoPA significantly outperforms traditional prompt tuning methods and DePT, with an average improvement of 28.62 percentage points and 25.39 percentage points, respectively. Additionally, LoPA excels in limited data settings, such as improving by 12.5 percentage points on the MRPC task and 6.13 percentage points on the RTE task. - **Code Understanding Tasks**: On tasks from the CruxEval dataset, LoPA significantly improves the performance of baseline models, especially on larger foundation models like LLama-3 and Phi-3, with performance gains ranging from 8 to 11 percentage points. - **Code Generation Tasks**: On the code generation tasks of the MBPP dataset, LoPA achieves performance improvements comparable to IDPG while significantly reducing the number of parameters. ### Conclusion By combining task-specific and instance-specific information and using low-rank decomposition to reduce the number of parameters, LoPA successfully addresses the scalability and performance shortcomings of traditional PEFT methods. Experimental results demonstrate that LoPA performs excellently across various tasks, making it an efficient and effective model fine-tuning method.

Prompt Tuning Strikes Back: Customizing Foundation Models with Low-Rank Prompt Adaptation

LIPT: Improving Prompt Tuning with Late Inception Reparameterization

LoPT: Low-Rank Prompt Tuning for Parameter Efficient Language Models

FPT: Improving Prompt Tuning Efficiency Via Progressive Training.

Delving into Parameter-Efficient Fine-Tuning in Code Change Learning: an Empirical Study

APrompt: Attention Prompt Tuning for Efficient Adaptation of Pre-trained Language Models

RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation

Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study

Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning

Efficient Prompt Tuning by Multi-Space Projection and Prompt Fusion

P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks

No More Fine-Tuning? An Experimental Evaluation of Prompt Tuning in Code Intelligence

P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks

Decomposed Prompt Tuning via Low-Rank Reparameterization

IAPT: Instance-Aware Prompt Tuning for Large Language Models

Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques for LLMs

MoRe Fine-Tuning with 10x Fewer Parameters

Parameter-Efficient Fine-Tuning With Adapters

Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling

IAPT: Instruction-Aware Prompt Tuning for Large Language Models

Skeleton: A New Framework for Accelerating Language Models via Task Neuron Localized Prompt Tuning