AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning

Yuwei Tang,Zhenyi Lin,Qilong Wang,Pengfei Zhu,Qinghua Hu
2024-04-13
Abstract:Recently, pre-trained vision-language models (e.g., CLIP) have shown great potential in few-shot learning and attracted a lot of research interest. Although efforts have been made to improve few-shot ability of CLIP, key factors on the effectiveness of existing methods have not been well studied, limiting further exploration of CLIP's potential in few-shot learning. In this paper, we first introduce a unified formulation to analyze CLIP-based few-shot learning methods from a perspective of logit bias, which encourages us to learn an effective logit bias for further improving performance of CLIP-based few-shot learning methods. To this end, we disassemble three key components involved in computation of logit bias (i.e., logit features, logit predictor, and logit fusion) and empirically analyze the effect on performance of few-shot classification. Based on analysis of key components, this paper proposes a novel AMU-Tuning method to learn effective logit bias for CLIP-based few-shot classification. Specifically, our AMU-Tuning predicts logit bias by exploiting the appropriate $\underline{\textbf{A}}$uxiliary features, which are fed into an efficient feature-initialized linear classifier with $\underline{\textbf{M}}$ulti-branch training. Finally, an $\underline{\textbf{U}}$ncertainty-based fusion is developed to incorporate logit bias into CLIP for few-shot classification. The experiments are conducted on several widely used benchmarks, and the results show AMU-Tuning clearly outperforms its counterparts while achieving state-of-the-art performance of CLIP-based few-shot learning without bells and whistles.
Computer Vision and Pattern Recognition,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to improve the performance of CLIP in few - shot learning by introducing effective logit bias. Specifically, the author believes that although existing methods have been improved on the basis of CLIP, the effectiveness factors of these methods have not been fully studied, which limits the exploration of the potential of CLIP in few - shot learning. ### Detailed description of the problem 1. **Limitations of existing methods**: - Although many works have attempted to improve the few - shot learning ability of CLIP, the relationships between these methods are relatively loose. - The influence of key factors such as logit bias on performance has not been deeply studied, which limits the further exploration of the potential of CLIP in few - shot learning. 2. **Introduction of a unified framework**: - The author proposes to introduce a unified formula framework from the perspective of logit bias to analyze the existing CLIP few - shot learning methods. - This framework allows for a more in - depth exploration of the influence of key components (such as logit features, logit predictors, and logit fusion) involved in different methods on performance. 3. **Specific problems**: - How to effectively learn logit bias to further improve the performance of CLIP few - shot learning methods? - How to fully utilize auxiliary features through techniques such as multi - branch training and uncertainty fusion to enhance the effect of logit bias? ### Overview of the solution To solve the above problems, the author proposes the AMU - Tuning method, which mainly includes the following aspects: 1. **Selection of auxiliary features**: - Select appropriate auxiliary features (auxiliary features) that should be complementary and superior to help calculate effective logit bias. 2. **Efficient logit predictor**: - A linear probe (LP) based on feature initialization is proposed, combined with a multi - branch training strategy, to improve the efficiency and effectiveness of the logit predictor. 3. **Adaptive logit fusion**: - An uncertainty - based fusion method is introduced to adaptively adjust the influence of logit bias according to the prediction confidence of zero - shot CLIP. ### Results and contributions - **Experimental verification**: Through multiple downstream tasks and OOD benchmark tests, AMU - Tuning significantly outperforms other methods and achieves state - of - the - art few - shot learning performance on the basis of CLIP. - **Theoretical contribution**: For the first time, a unified formula framework is introduced from the perspective of logit bias, providing a new perspective for further research on CLIP's few - shot learning. Through these improvements, AMU - Tuning not only improves the performance of CLIP in few - shot learning, but also provides a valuable reference for future research.