Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning

Zhen Wang,Rameswar Panda,Leonid Karlinsky,Rogerio Feris,Huan Sun,Yoon Kim
2023-03-06
Abstract:Prompt tuning, in which a base pretrained model is adapted to each task via conditioning on learned prompt vectors, has emerged as a promising approach for efficiently adapting large language models to multiple downstream tasks. However, existing methods typically learn soft prompt vectors from scratch, and it has not been clear how to exploit the rich cross-task knowledge with prompt vectors in a multitask learning setting. We propose multitask prompt tuning (MPT), which first learns a single transferable prompt by distilling knowledge from multiple task-specific source prompts. We then learn multiplicative low rank updates to this shared prompt to efficiently adapt it to each downstream target task. Extensive experiments on 23 NLP datasets demonstrate that our proposed approach outperforms the state-of-the-art methods, including the full finetuning baseline in some cases, despite only tuning 0.035% as many task-specific parameters.
Computation and Language
What problem does this paper attempt to address?
The paper aims to address the issue of efficient transfer learning of large pre-trained language models (PLMs) across multiple downstream tasks. Specifically, the paper proposes Multitask Prompt Tuning (MPT), a method for learning a transferable shared prompt by distilling knowledge from multiple source tasks. This approach overcomes the limitations of existing methods in handling cross-task knowledge and effectively adapts the shared prompt to each downstream target task without significantly increasing the number of parameters. The main contributions of MPT include: 1. **Proposing a new multitask prompt tuning framework**: First, a shared prompt matrix is learned from multiple source tasks through prompt decomposition and distillation techniques, and then this shared prompt is updated with low-rank multiplication to adapt to different target tasks. 2. **Demonstrating its superiority through experiments**: Experiments on 23 NLP datasets show that MPT outperforms existing state-of-the-art methods in multiple benchmarks, even when only a minimal amount of task-specific parameters are adjusted (only 0.035% of the full fine-tuning parameters). 3. **Applicability to few-shot learning**: MPT performs excellently with a small number of labeled samples, proving that it can work effectively even in resource-limited scenarios. Through these improvements, MPT not only enhances model performance but also significantly reduces the required computational resources.