Distinguishing Textual Prompt Importance: Image-Guided Text Weighting for CLIP-Based Few-shot Learning

Tianci Xun,Wei Chen,Yulin He,Di Wu,Yuanming Gao,Jiuyuan Zhu,Weiwei Zheng
DOI: https://doi.org/10.1109/icme57554.2024.10687683
2024-01-01
Abstract:Few-Shot learning deals with learning a model capable of recognizing new classes when provided with limited labeled data. Recently, Contrastive Language-Image Pre-training (CLIP) based methods have shown significant potential in this filed. However, due to the equal treatment for textual prompts, existing CLIP-based methods fail in establishing a strong text classifier, consequently limiting their performance. These textual prompts are obtained by querying large language models (LLMs), making it unreasonable to assume uniform quality across all of them. To address this issue, this paper proposes an Image-guided Text weighting (IGTW) module, which employs the feature similarity of training images and textual prompts to guide the weighting of textual prompts. After applying our method to the recent state-of-the-art method (i.e., CaFo) and classic method (i.e., Tip-Adapter), consistent improvements are achieved across 11 few-shot learning datasets, proving the effectiveness and universality of our method.
What problem does this paper attempt to address?