Abstract:Generalized Zero-Shot Learning (GZSL) holds significant research importance as it enables the classification of samples from both seen and unseen classes. A prevailing approach for GZSL is learning transferable representations that can generalize well to both seen and unseen classes during testing. This approach encompasses two key concepts: discriminative representations and semantic-relevant representations. "Semantic-relevant" facilitates the transfer of semantic knowledge using pre-defined semantic descriptors, while "discriminative" is crucial for accurate category discrimination. However, these two concepts are arguably inherently conflicting, as semantic descriptors are not specifically designed for image classification. Existing methods often struggle with balancing these two aspects and neglect the conflict between them, leading to suboptimal representation generalization and transferability to unseen classes. To address this issue, we propose a novel partially-shared multi-task representation learning method, termed PS-GZSL, which jointly preserves complementary and sharable knowledge between these two concepts. Specifically, we first propose a novel perspective that treats the learning of discriminative and semantic-relevant representations as optimizing a discrimination task and a visual-semantic alignment task, respectively. Then, to learn more complete and generalizable representations, PS-GZSL explicitly factorizes visual features into task-shared and task-specific representations and introduces two advanced tasks: an instance-level contrastive discrimination task and a relation-based visual-semantic alignment task. Furthermore, PS-GZSL employs Mixture-of-Experts (MoE) with a dropout mechanism to prevent representation degeneration and integrates a conditional GAN (cGAN) to synthesize unseen features for estimating unseen visual features. Extensive experiments and more competitive results on five widely-used GZSL benchmark datasets validate the effectiveness of our PS-GZSL.

Global-local prompts guided image-text embedding, alignment and aggregation for multi-label zero-shot learning

Dual Collaborative Visual-Semantic Mapping for Multi-Label Zero-Shot Image Recognition

Joint Learning of Attended Zero-Shot Features and Visual-Semantic Mapping.

Epsilon: Exploring Comprehensive Visual-Semantic Projection for Multi-Label Zero-Shot Learning

Multi-Label Zero-Shot Learning with Structured Knowledge Graphs

Global and Local Semantic Completion Learning for Vision-Language Pre-training

Language-Augmented Pixel Embedding for Generalized Zero-Shot Learning

Multi-modal Generative Adversarial Network for Zero-Shot Learning

Manifold Regularized Cross-Modal Embedding for Zero-Shot Learning

Global-Local Interplay in Semantic Alignment for Few-Shot Learning

Text-Video Retrieval with Global-Local Semantic Consistent Learning

Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning

Multi-label zero-shot learning with graph convolutional networks

SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition

Semantic-guided Reinforced Region Embedding for Generalized Zero-Shot Learning

GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot Anomaly Detection

A Transferable Generative Framework for Multi-Label Zero-Shot Learning

Generalized Zero-Shot Image Classification via Partially-Shared Multi-Task Representation Learning

ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning

Decomposed Soft Prompt Guided Fusion Enhancing for Compositional Zero-Shot Learning

Large Language Models are Good Prompt Learners for Low-Shot Image Classification