Hierarchy-Aware Interactive Prompt Learning for Few-Shot Classification
Xiaotian Yin,Jiamin Wu,Wenfei Yang,Xu Zhou,Shifeng Zhang,Tianzhu Zhang
DOI: https://doi.org/10.1109/tcsvt.2024.3432753
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Few-Shot Learning (FSL) leverages prior knowledge and generalization strategies to quickly adapt to new tasks or recognize new objects with minimal input. Recently, CLIP-based methods, aided by contrastive language-image pre-training, have demonstrated impressive few-shot performance. However, these methods solely employ fixed-length uni-modal prompts at the initial encoder layer, neglecting the multi-level adaptation and cross-modal interaction for the intermediate features. To address this issue, we propose Hierarchy-Aware Interactive Prompt Learning (HIPL), by jointly exploring hierarchical prompt learning and cross-modal prompt interaction for CLIP-based FSC. The proposed HIPL enjoys several merits. First, we design a hierarchical prompt aggregation module to progressively generate higher-level prompts via the attention mechanisms, equipping the CLIP with hierarchical adaptation capability. Second, a cross-modal prompt interaction module is proposed to facilitate deep interaction between stage-wise prompts, ensuring mutual synergy between vision and textual features. To the best of our knowledge, this is the first work to learn multi-level prompts by progressive aggregation. Our extensive experiments demonstrate that HIPL outperforms previous methods in few-shot classification and base-to-new generalization. Our code is available at https://github.com/Yxt1212/HIPL.