Abstract:Generalized Zero-Shot Learning (GZSL) and Open-Set Recognition (OSR) are two mainstream settings that greatly extend conventional visual object recognition. However, the limitations of their problem settings are not negligible. The novel categories in GZSL require pre-defined semantic labels, making the problem setting less realistic; the oversimplified unknown class in OSR fails to explore the innate fine-grained and mixed structures of novel categories. In light of this, we are motivated to consider a new problem setting named Zero-Knowledge Zero-Shot Learning (ZK-ZSL) that assumes no prior knowledge of novel classes and aims to classify seen and unseen samples and recover semantic attributes of the fine-grained novel categories for further interpretation. To achieve this, we propose a novel framework that recovers the clustering structures of both seen and unseen categories where the seen class structures are guided by source labels. In addition, a structural alignment loss is designed to aid the semantic learning of unseen categories with their recovered structures. Experimental results demonstrate our method's superior performance in classification and semantic recovery on four benchmark datasets.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to discover new classes and recover the semantic attributes of these new classes in visual recognition without prior knowledge. Specifically, the paper proposes a new problem setting - Zero - Knowledge Zero - Shot Learning (ZK - ZSL), aiming to identify seen and unseen samples without prior knowledge of new classes and recover the fine - grained semantic attributes of unseen classes. ### Background and Motivation Traditional visual recognition tasks usually assume that the training set and the test set share the same label space, which is often difficult to meet in practical applications because real - world data may be insufficient or unavailable. To solve this problem, researchers have proposed two methods: Generalized Zero - Shot Learning (GZSL) and Open - Set Recognition (OSR). However, both of these methods have their limitations: - **GZSL**: It requires pre - defined semantic labels for new classes, which makes the problem setting less realistic. - **OSR**: It treats unknown classes as a whole, fails to explore the internal fine - grained structure of these new classes, and does not consider the semantic attributes of visual classes, which limits in - depth analysis and interpretation of new classes. ### Proposed Method To overcome the above limitations, the paper proposes ZK - ZSL, that is, to identify seen and unseen classes and recover the semantic attributes of unseen classes without prior knowledge of new classes. To this end, the author designs a new framework, which mainly includes the following three components: 1. **Source - Guided Clustering**: Learn to recover the clustering structure of seen and unseen classes in the target dataset, where the clustering structure of seen classes is guided by source labels. 2. **Semantic Prediction**: Map the hidden embeddings to the semantic space. 3. **Structural Alignment**: Support the semantic learning of unseen classes by minimizing the differences between the clustering structures in the embedding space and the semantic space. ### Loss Function The author defines three main loss functions to train the model: 1. **Source - Guided Clustering Loss**: - **Self - Reconstruction Loss**: Helps the encoder and decoder extract clustering - friendly hidden embeddings while minimizing information loss. - **Clustering Regularization Loss**: Makes data points approach possible clustering centers by minimizing the differences between the clustering assignment distribution and the target distribution. - **Source Centroid Alignment Loss**: Pulls source samples towards their corresponding clustering centers, reducing the distribution drift between seen classes in the source and target datasets. 2. **Semantic Prediction Loss**: Uses pairwise ranking loss to learn the mapping from the embedding space to the semantic space. 3. **Structural Alignment Loss**: Prevents overfitting on source semantic data and accurately predicts the semantic attributes of unseen classes by minimizing the differences between the clustering structures in the embedding space and the semantic space. ### Experimental Results The author conducted experiments on four benchmark datasets, including APY, CUB, AWA2, and SUN. The experimental results show that the proposed method performs well in terms of classification and semantic recovery accuracy, especially on the APY and AWA2 datasets, and the overall performance is better than other baseline methods. In addition, the effectiveness of the method is further verified through confusion matrices and t - SNE visualizations. ### Conclusion The paper proposes a new ZK - ZSL framework, which can effectively identify seen and unseen classes and recover the semantic attributes of unseen classes without prior knowledge of new classes. The experimental results prove the superior performance of this method on multiple benchmark datasets.

Zero-Knowledge Zero-Shot Learning for Novel Visual Category Discovery

Zero-Shot Learning with Generative Latent Prototype Model.

Multi-modal Generative Adversarial Network for Zero-Shot Learning

OntoZSL: Ontology-enhanced Zero-shot Learning

Asymmetric Graph Based Zero Shot Learning

Zero-Shot Learning Based on Knowledge Sharing

Meta-Transfer Networks for Zero-Shot Learning

Learn More from Less: Generalized Zero-Shot Learning with Severely Limited Labeled Data

Zero-shot Recognition with Latent Visual Attributes Learning.

Knowledge-aware Zero-Shot Learning: Survey and Perspective

Multi-Label Zero-Shot Learning with Structured Knowledge Graphs

Zero-Shot Learning via Category-Specific Visual-Semantic Mapping and Label Refinement

Learning Adversarial Semantic Embeddings for Zero-Shot Recognition in Open Worlds

Knowledge Distillation Classifier Generation Network for Zero-Shot Learning

Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph Propagation

Zero-Shot Learning on Semantic Class Prototype Graph

Generalized Zero-Shot Recognition based on Visually Semantic Embedding

Zero-shot Learning Via Discriminative Representation Extraction.

Learning Discriminative Projection with Visual Semantic Alignment for Generalized Zero Shot Learning.