InsVP: Efficient Instance Visual Prompting from Image Itself

Zichen Liu,Yuxin Peng,Jiahuan Zhou
DOI: https://doi.org/10.1145/3664647.3681233
2024-01-01
Abstract:Visual prompting is an efficient methodology for finetuning pretrained visual models by introducing a small number of learnable parameters while keeping the backbone frozen. However, most existing visual prompting methods learn a shared prompt for all samples, making it challenging to grasp distinct characteristics among diverse samples, thereby limiting the model's performance. While other methods partially address this issue through sample clustering and learning multiple prompts, they still struggle to capture nuanced differences among instances and incur significant parameter overhead. Therefore, to comprehensively and efficiently leverage discriminative characteristics of individual instances, we propose an Instance Visual Prompting method, called InsVP. Initially, the instance image prompt is introduced to extract both crucial and nuanced discriminative information from the original image itself and is overlaid onto the input image. Furthermore, the instance feature prompt is designed to capture both commonalities and characteristics among individual instances, fed into the model's intermediate layers to facilitate feature extraction. Consequently, the instance image and feature prompts complement each other, enhancing the adaptation ability of pretrained models to extract discriminative features from individual instances. Extensive experiments on various large-scale benchmarks show that our InsVP achieves superior performance exceeding the state-of-the-art methods at a lower parameter cost. The code is available at https://github.com/zhoujiahuan1991/MM2024-InsVP .
What problem does this paper attempt to address?