Bayesian Exploration of Pre-trained Models for Low-shot Image Classification

Yibo Miao,Yu Lei,Feng Zhou,Zhijie Deng

2024-03-30

Abstract:Low-shot image classification is a fundamental task in computer vision, and the emergence of large-scale vision-language models such as CLIP has greatly advanced the forefront of research in this field. However, most existing CLIP-based methods lack the flexibility to effectively incorporate other pre-trained models that encompass knowledge distinct from CLIP. To bridge the gap, this work proposes a simple and effective probabilistic model ensemble framework based on Gaussian processes, which have previously demonstrated remarkable efficacy in processing small data. We achieve the integration of prior knowledge by specifying the mean function with CLIP and the kernel function with an ensemble of deep kernels built upon various pre-trained models. By regressing the classification label directly, our framework enables analytical inference, straightforward uncertainty quantification, and principled hyper-parameter tuning. Through extensive experiments on standard benchmarks, we demonstrate that our method consistently outperforms competitive ensemble baselines regarding predictive performance. Additionally, we assess the robustness of our method and the quality of the yielded uncertainty estimates on out-of-distribution datasets. We also illustrate that our method, despite relying on label regression, still enjoys superior model calibration compared to most deterministic baselines.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

This paper mainly discusses how to improve the performance of low-shot image classification tasks using pre-trained models, especially the CLIP model and other pre-trained models. Current methods often lack flexibility and cannot effectively integrate knowledge from different pre-trained models with CLIP. Therefore, the paper proposes a probabilistic model ensemble framework based on Gaussian Processes (GP). GP has been proven to be suitable for low-shot image classification due to its efficiency on small datasets. The author specifies CLIP as the prior mean function and the combination of deep kernel functions constructed by various pre-trained models as the kernel function to integrate prior knowledge from different models. By directly regressing the classification labels, this framework allows for analytical inference, intuitive uncertainty quantification, and principled hyperparameter tuning. Experimental results demonstrate that this method outperforms competitive ensemble baselines on standard benchmark tests and exhibits better robustness and uncertainty estimation quality on out-of-distribution datasets. Furthermore, despite relying on label regression, its model calibration capability is still superior to most deterministic baselines. The paper also introduces related work, including zero-shot/low-shot classification, the application of pre-trained models in vision and other domains, and deep Gaussian processes. By employing Bayesian learning methods, especially GP regression, the paper addresses the issues of overfitting and inaccurate uncertainty estimation that existing methods may have. Finally, the paper provides an algorithm overview, experimental settings, predictive performance evaluation, and analysis of robustness and uncertainty measurement for outlier data. Experimental results show that this method performs well on multiple low-shot classification benchmarks and is effective in identifying and handling out-of-distribution data.

Bayesian Exploration of Pre-trained Models for Low-shot Image Classification

Unified View Empirical Study for Large Pretrained Model on Cross-Domain Few-Shot Learning

BayesAdapter: enhanced uncertainty estimation in CLIP few-shot adaptation

Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

Transductive Zero-Shot and Few-Shot CLIP

Bayesian Cross-Modal Alignment Learning for Few-Shot Out-of-Distribution Generalization.

PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts

Bayesian Evidential Learning for Few-Shot Classification

Prototype Bayesian Meta-Learning for Few-Shot Image Classification

A Closer Look at the Robustness of Contrastive Language-Image Pre-Training (CLIP)

Text-Guided Mixup Towards Long-Tailed Image Categorization

An Integrated Model for Bayesian Learning of Sparse Representation and Classifier Training.

Multimodal CLIP Inference for Meta-Few-Shot Image Classification

Open-Vocabulary Multi-label Image Classification with Pretrained Vision-Language Model

CLIP-guided Black-Box Domain Adaptation of Image Classification

Toward a Holistic Evaluation of Robustness in CLIP Models

Language-Driven Cross-Modal Classifier for Zero-Shot Multi-Label Image Recognition

Iclip: Bridging Image Classification and Contrastive Language-Image Pre-Training for Visual Recognition

A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation

Large Language Models are Good Prompt Learners for Low-Shot Image Classification

Learning to Adapt CLIP for Few-Shot Monocular Depth Estimation