Abstract:Deep learning models have the ability to extract rich knowledge from large-scale datasets. However, the sharing of data has become increasingly challenging due to concerns regarding data copyright and privacy. Consequently, this hampers the effective transfer of knowledge from existing data to novel downstream tasks and concepts. Zero-shot learning (ZSL) approaches aim to recognize new classes by transferring semantic knowledge learned from base classes. However, traditional generative ZSL methods often require access to real images from base classes and rely on manually annotated attributes, which presents challenges in terms of data restrictions and model scalability. To this end, this paper tackles a challenging and practical problem dubbed as data-free zero-shot learning (DFZSL), where only the CLIP-based base classes data pre-trained classifier is available for zero-shot classification. Specifically, we propose a generic framework for DFZSL, which consists of three main components. Firstly, to recover the virtual features of the base data, we model the CLIP features of base class images as samples from a von Mises-Fisher (vMF) distribution based on the pre-trained classifier. Secondly, we leverage the text features of CLIP as low-cost semantic information and propose a feature-language prompt tuning (FLPT) method to further align the virtual image features and textual features. Thirdly, we train a conditional generative model using the well-aligned virtual image features and corresponding semantic text features, enabling the generation of new classes features and achieve better zero-shot generalization. Our framework has been evaluated on five commonly used benchmarks for generalized ZSL, as well as 11 benchmarks for the base-to-new ZSL. The results demonstrate the superiority and effectiveness of our approach. Our code is available in

Application of CLIP for Efficient Zero-Shot Learning

ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation

CLIP Is Also a Good Teacher: A New Learning Framework for Inductive Zero-shot Semantic Segmentation

Zero-shot Learning Via Discriminative Representation Extraction.

Cross-Layer Autoencoder for Zero-Shot Learning

Discriminative and Robust Attribute Alignment for Zero-Shot Learning

Transductive Unbiased Embedding for Zero-Shot Learning

A Zero-Shot Learning Framework Via Cluster-Prototype Matching

Federated Zero-Shot Learning for Visual Recognition

CHANNEL-WISE MIX-FUSION DEEP NEURAL NETWORKS FOR ZERO-SHOT LEARNING

Hierarchical Coupled Discriminative Dictionary Learning for Zero-Shot Learning

Data-Free Generalized Zero-Shot Learning

Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation

TransZero: Attribute-guided Transformer for Zero-Shot Learning

Zero-shot Recognition with Latent Visual Attributes Learning.

Online Zero-Shot Classification with CLIP

Dr. CLIP: CLIP-Driven Universal Framework for Zero-Shot Sketch Image Retrieval

Compositional Zero-Shot Learning with Contextualized Cues and Adaptive Contrastive Training

Zero-shot Learning Via the Fusion of Generation and Embedding for Image Recognition

LabCLIP: Label-Enhanced Clip for Improving Zero-Shot Text Classification.