Zero-Knowledge Zero-Shot Learning for Novel Visual Category Discovery

Zhaonan Li,Hongfu Liu
DOI: https://doi.org/10.48550/arXiv.2302.04427
2023-02-09
Abstract:Generalized Zero-Shot Learning (GZSL) and Open-Set Recognition (OSR) are two mainstream settings that greatly extend conventional visual object recognition. However, the limitations of their problem settings are not negligible. The novel categories in GZSL require pre-defined semantic labels, making the problem setting less realistic; the oversimplified unknown class in OSR fails to explore the innate fine-grained and mixed structures of novel categories. In light of this, we are motivated to consider a new problem setting named Zero-Knowledge Zero-Shot Learning (ZK-ZSL) that assumes no prior knowledge of novel classes and aims to classify seen and unseen samples and recover semantic attributes of the fine-grained novel categories for further interpretation. To achieve this, we propose a novel framework that recovers the clustering structures of both seen and unseen categories where the seen class structures are guided by source labels. In addition, a structural alignment loss is designed to aid the semantic learning of unseen categories with their recovered structures. Experimental results demonstrate our method's superior performance in classification and semantic recovery on four benchmark datasets.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to discover new classes and recover the semantic attributes of these new classes in visual recognition without prior knowledge. Specifically, the paper proposes a new problem setting - Zero - Knowledge Zero - Shot Learning (ZK - ZSL), aiming to identify seen and unseen samples without prior knowledge of new classes and recover the fine - grained semantic attributes of unseen classes. ### Background and Motivation Traditional visual recognition tasks usually assume that the training set and the test set share the same label space, which is often difficult to meet in practical applications because real - world data may be insufficient or unavailable. To solve this problem, researchers have proposed two methods: Generalized Zero - Shot Learning (GZSL) and Open - Set Recognition (OSR). However, both of these methods have their limitations: - **GZSL**: It requires pre - defined semantic labels for new classes, which makes the problem setting less realistic. - **OSR**: It treats unknown classes as a whole, fails to explore the internal fine - grained structure of these new classes, and does not consider the semantic attributes of visual classes, which limits in - depth analysis and interpretation of new classes. ### Proposed Method To overcome the above limitations, the paper proposes ZK - ZSL, that is, to identify seen and unseen classes and recover the semantic attributes of unseen classes without prior knowledge of new classes. To this end, the author designs a new framework, which mainly includes the following three components: 1. **Source - Guided Clustering**: Learn to recover the clustering structure of seen and unseen classes in the target dataset, where the clustering structure of seen classes is guided by source labels. 2. **Semantic Prediction**: Map the hidden embeddings to the semantic space. 3. **Structural Alignment**: Support the semantic learning of unseen classes by minimizing the differences between the clustering structures in the embedding space and the semantic space. ### Loss Function The author defines three main loss functions to train the model: 1. **Source - Guided Clustering Loss**: - **Self - Reconstruction Loss**: Helps the encoder and decoder extract clustering - friendly hidden embeddings while minimizing information loss. - **Clustering Regularization Loss**: Makes data points approach possible clustering centers by minimizing the differences between the clustering assignment distribution and the target distribution. - **Source Centroid Alignment Loss**: Pulls source samples towards their corresponding clustering centers, reducing the distribution drift between seen classes in the source and target datasets. 2. **Semantic Prediction Loss**: Uses pairwise ranking loss to learn the mapping from the embedding space to the semantic space. 3. **Structural Alignment Loss**: Prevents overfitting on source semantic data and accurately predicts the semantic attributes of unseen classes by minimizing the differences between the clustering structures in the embedding space and the semantic space. ### Experimental Results The author conducted experiments on four benchmark datasets, including APY, CUB, AWA2, and SUN. The experimental results show that the proposed method performs well in terms of classification and semantic recovery accuracy, especially on the APY and AWA2 datasets, and the overall performance is better than other baseline methods. In addition, the effectiveness of the method is further verified through confusion matrices and t - SNE visualizations. ### Conclusion The paper proposes a new ZK - ZSL framework, which can effectively identify seen and unseen classes and recover the semantic attributes of unseen classes without prior knowledge of new classes. The experimental results prove the superior performance of this method on multiple benchmark datasets.