Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning

Kun Ding,Haojian Zhang,Qiang Yu,Ying Wang,Shiming Xiang,Chunhong Pan
2024-03-31
Abstract:We propose a generalized method for boosting the generalization ability of pre-trained vision-language models (VLMs) while fine-tuning on downstream few-shot tasks. The idea is realized by exploiting out-of-distribution (OOD) detection to predict whether a sample belongs to a base distribution or a novel distribution and then using the score generated by a dedicated competition based scoring function to fuse the zero-shot and few-shot classifier. The fused classifier is dynamic, which will bias towards the zero-shot classifier if a sample is more likely from the distribution pre-trained on, leading to improved base-to-novel generalization ability. Our method is performed only in test stage, which is applicable to boost existing methods without time-consuming re-training. Extensive experiments show that even weak distribution detectors can still improve VLMs' generalization ability. Specifically, with the help of OOD detectors, the harmonic mean of CoOp and ProGrad increase by 2.6 and 1.5 percentage points over 11 recognition datasets in the base-to-novel setting.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper attempts to solve the problem of the unbalanced generalization ability between base classes and novel classes in Vision - Language Prompt Tuning (VLPT). Specifically, existing VLPT methods such as CoOp perform excellently on base classes but poorly on novel classes. This imbalance restricts the model's generalization ability on unseen data. To this end, the authors propose a new method to improve the model's generalization ability between base classes and novel classes by using Out - of - Distribution (OOD) detection to dynamically fuse zero - shot classifiers and few - shot classifiers. ### Method overview 1. **OOD Detection**: The authors use existing OOD detection methods to predict whether a sample belongs to the base distribution or the new distribution. These methods include Maximum Soft - Max Probability (MSP), MaxLogit, and Energy function, etc. 2. **Competitive Scoring Function**: A competition - based scoring function is designed, which uses the scores generated by the OOD detection methods to generate a score between 0 and 1 for each test sample. This score is used to dynamically fuse zero - shot classifiers and few - shot classifiers. 3. **Dynamic Fusion**: According to the score generated by the scoring function, the weights of the zero - shot classifier and the few - shot classifier are dynamically adjusted. If the sample is more likely to come from the base distribution, more reliance is placed on the few - shot classifier; otherwise, more reliance is placed on the zero - shot classifier. ### Experimental results The authors conducted experiments on multiple datasets to verify the effectiveness of the proposed method. The experimental results show that even using a weaker OOD detector can significantly improve the generalization ability of existing VLPT methods. Specifically: - **Generalization from base classes to novel classes**: On 11 recognition datasets, the proposed method increased the harmonic mean accuracy of CoOp and ProGrad by 2.6% and 1.5% respectively. - **Domain generalization**: In the cross - dataset domain generalization setting, the average accuracy of the proposed method on the target domain is also very competitive. ### Main contributions 1. **New perspective**: A new perspective of using OOD detection to solve the generalization problem from base classes to novel classes in VLPT is proposed. 2. **Competitive scoring function**: A competition - based scoring function for dynamically fusing zero - shot and few - shot classifiers is designed. 3. **Experimental verification**: It is verified through experiments that even a weaker distribution detector can improve the generalization ability of existing VLPT methods. ### Conclusion By introducing OOD detection and dynamic fusion techniques, this paper effectively solves the problem of unbalanced generalization ability between base classes and novel classes in VLPT and significantly improves the generalization performance of the model. This method is not only applicable to existing VLPT methods but can also be easily integrated at the test stage without retraining the model.