Meta-Prompt Tuning Vision-Language Model for Multi-Label Few-Shot Image Recognition

Feng Zhang,Wei Chen,Fei Ding,Tengjiao Wang,Dawei Lu,Jiabin Zheng
DOI: https://doi.org/10.1145/3627673.3679963
2024-01-01
Abstract:Multi-label few-shot image recognition aims to identify multiple unseen objects using only a handful of examples. Recent methods typically tune pre-trained vision-language models with shared or class-specific prompts. However, they still have drawbacks. Tuning a shared prompt is insufficient for all samples especially when the tasks are complex and tuning specific prompts for each class is inevitable to lose generalization ability, thus failing to capture diverse visual knowledge. To address these issues, we propose to meta-tune a generalized prompt pool, enabling each prompt to act as an expert for multi-label few-shot image recognition. Specifically, we first construct a diverse prompt pool to handle complex samples and tasks effectively. Then, the meta-tuning strategy is designed to learn meta-knowledge and transfer it from source tasks to target tasks, enhancing the generalization of prompts. Extensive experimental results on two widely used multi-label image recognition datasets demonstrate the effectiveness of our method.
What problem does this paper attempt to address?