Abstract:Recently, pre-trained vision-language models (e.g., CLIP) have shown great potential in few-shot learning and attracted a lot of research interest. Although efforts have been made to improve few-shot ability of CLIP, key factors on the effectiveness of existing methods have not been well studied, limiting further exploration of CLIP's potential in few-shot learning. In this paper, we first introduce a unified formulation to analyze CLIP-based few-shot learning methods from a perspective of logit bias, which encourages us to learn an effective logit bias for further improving performance of CLIP-based few-shot learning methods. To this end, we disassemble three key components involved in computation of logit bias (i.e., logit features, logit predictor, and logit fusion) and empirically analyze the effect on performance of few-shot classification. Based on analysis of key components, this paper proposes a novel AMU-Tuning method to learn effective logit bias for CLIP-based few-shot classification. Specifically, our AMU-Tuning predicts logit bias by exploiting the appropriate $\underline{\textbf{A}}$uxiliary features, which are fed into an efficient feature-initialized linear classifier with $\underline{\textbf{M}}$ulti-branch training. Finally, an $\underline{\textbf{U}}$ncertainty-based fusion is developed to incorporate logit bias into CLIP for few-shot classification. The experiments are conducted on several widely used benchmarks, and the results show AMU-Tuning clearly outperforms its counterparts while achieving state-of-the-art performance of CLIP-based few-shot learning without bells and whistles.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to improve the performance of CLIP in few - shot learning by introducing effective logit bias. Specifically, the author believes that although existing methods have been improved on the basis of CLIP, the effectiveness factors of these methods have not been fully studied, which limits the exploration of the potential of CLIP in few - shot learning. ### Detailed description of the problem 1. **Limitations of existing methods**: - Although many works have attempted to improve the few - shot learning ability of CLIP, the relationships between these methods are relatively loose. - The influence of key factors such as logit bias on performance has not been deeply studied, which limits the further exploration of the potential of CLIP in few - shot learning. 2. **Introduction of a unified framework**: - The author proposes to introduce a unified formula framework from the perspective of logit bias to analyze the existing CLIP few - shot learning methods. - This framework allows for a more in - depth exploration of the influence of key components (such as logit features, logit predictors, and logit fusion) involved in different methods on performance. 3. **Specific problems**: - How to effectively learn logit bias to further improve the performance of CLIP few - shot learning methods? - How to fully utilize auxiliary features through techniques such as multi - branch training and uncertainty fusion to enhance the effect of logit bias? ### Overview of the solution To solve the above problems, the author proposes the AMU - Tuning method, which mainly includes the following aspects: 1. **Selection of auxiliary features**: - Select appropriate auxiliary features (auxiliary features) that should be complementary and superior to help calculate effective logit bias. 2. **Efficient logit predictor**: - A linear probe (LP) based on feature initialization is proposed, combined with a multi - branch training strategy, to improve the efficiency and effectiveness of the logit predictor. 3. **Adaptive logit fusion**: - An uncertainty - based fusion method is introduced to adaptively adjust the influence of logit bias according to the prediction confidence of zero - shot CLIP. ### Results and contributions - **Experimental verification**: Through multiple downstream tasks and OOD benchmark tests, AMU - Tuning significantly outperforms other methods and achieves state - of - the - art few - shot learning performance on the basis of CLIP. - **Theoretical contribution**: For the first time, a unified formula framework is introduced from the perspective of logit bias, providing a new perspective for further research on CLIP's few - shot learning. Through these improvements, AMU - Tuning not only improves the performance of CLIP in few - shot learning, but also provides a valuable reference for future research.

AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning

Feature Transformation for Few-Shot Learning

Vision-Language Model Fine-Tuning via Simple Parameter-Efficient Modification

Enhancing Few-Shot CLIP With Semantic-Aware Fine-Tuning

Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement

Learning to Adapt Category Consistent Meta-Feature of CLIP for Few-Shot Classification

Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models

Multimodal CLIP Inference for Meta-Few-Shot Image Classification

Fully Fine-tuned CLIP Models are Efficient Few-Shot Learners

Leveraging Biases in Large Language Models: "bias-kNN'' for Effective Few-Shot Learning

MICM: Rethinking Unsupervised Pretraining for Enhanced Few-shot Learning

Improving the Generalization of MAML in Few-shot Classification via Bi-level Constraint

UMFC: Unsupervised Multi-Domain Feature Calibration for Vision-Language Models

Boosting Few-Shot Classification with View-Learnable Contrastive Learning

Enhancing Few-Shot Classification without Forgetting through Multi-Level Contrastive Constraints

CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention

Binocular Mutual Learning for Improving Few-shot Classification

DiffCLIP: Few-shot Language-driven Multimodal Classifier

Transductive Zero-Shot and Few-Shot CLIP

The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning