Abstract:Prompt Tuning, conditioning on task-specific learned prompt vectors, has emerged as a data-efficient and parameter-efficient method for adapting large pretrained vision-language models to multiple downstream tasks. However, existing approaches usually consider learning prompt vectors for each task independently from scratch, thereby failing to exploit the rich shareable knowledge across different vision-language tasks. In this paper, we propose multitask vision-language prompt tuning (MVLPT), which incorporates cross-task knowledge into prompt tuning for vision-language models. Specifically, (i) we demonstrate the effectiveness of learning a single transferable prompt from multiple source tasks to initialize the prompt for each target task; (ii) we show many target tasks can benefit each other from sharing prompt vectors and thus can be jointly learned via multitask prompt tuning. We benchmark the proposed MVLPT using three representative prompt tuning methods, namely text prompt tuning, visual prompt tuning, and the unified vision-language prompt tuning. Results in 20 vision tasks demonstrate that the proposed approach outperforms all single-task baseline prompt tuning methods, setting the new state-of-the-art on the few-shot ELEVATER benchmarks and cross-task generalization benchmarks. To understand where the cross-task knowledge is most effective, we also conduct a large-scale study on task transferability with 20 vision tasks in 400 combinations for each prompt tuning method. It shows that the most performant MVLPT for each prompt tuning method prefers different task combinations and many tasks can benefit each other, depending on their visual similarity and label similarity. Code is available at <a class="link-external link-https" href="https://github.com/sIncerass/MVLPT" rel="external noopener nofollow">this https URL</a>.

Parameter Efficient Multi-task Fine-tuning by Learning to Transfer Token-wise Prompts

Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning

FPT: Improving Prompt Tuning Efficiency Via Progressive Training.

On Transferability of Prompt Tuning for Natural Language Processing

Efficient Prompt Tuning by Multi-Space Projection and Prompt Fusion

LIPT: Improving Prompt Tuning with Late Inception Reparameterization

Improving Prompt Tuning with Learned Prompting Layers

Instance-wise Prompt Tuning for Pretrained Language Models

Soft Prompt Tuning for Cross-Lingual Transfer: When Less is More

Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning

Pro-tuning: Unified Prompt Tuning for Vision Tasks

Towards Unified Prompt Tuning for Few-shot Text Classification

Multitask Pre-training of Modular Prompt for Chinese Few-Shot Learning

Enhancing Few-Shot Transfer Learning with Optimized Multi-Task Prompt Tuning through Modular Prompt Composition

APrompt: Attention Prompt Tuning for Efficient Adaptation of Pre-trained Language Models

Dynamic Prompting: A Unified Framework for Prompt Tuning

LoPT: Low-Rank Prompt Tuning for Parameter Efficient Language Models

Prompt Tuning for Unified Multimodal Pretrained Models.

Effectively Prompting Small-sized Language Models for Cross-lingual Tasks via Winning Tickets

Multitask Vision-Language Prompt Tuning

Revisiting the Power of Prompt for Visual Tuning