Abstract:Zero-shot learning aims to recognize objects which do not appear in the training dataset. Previous prevalent mapping-based zero-shot learning methods suffer from the projection domain shift problem due to the lack of image classes in the training stage. In order to alleviate the projection domain shift problem, a deep unbiased embedding transfer (DUET) model is proposed in this paper. The DUET model is composed of a deep embedding transfer (DET) module and an unseen visual feature generation (UVG) module. In the DET module, a novel combined embedding transfer net which integrates the complementary merits of the linear and nonlinear embedding mapping functions is proposed to connect the visual space and semantic space. What's more, the end-to-end joint training process is implemented to train the visual feature extractor and the combined embedding transfer net simultaneously. In the UVG module, a visual feature generator trained with a conditional generative adversarial framework is used to synthesize the visual features of the unseen classes to ease the disturbance of the projection domain shift problem. Furthermore, a quantitative index, namely the score of resistance on domain shift (ScoreRDS), is proposed to evaluate different models regarding their resistance capability on the projection domain shift problem. The experiments on five zero-shot learning benchmarks verify the effectiveness of the proposed DUET model. As demonstrated by the qualitative and quantitative analysis, the unseen class visual feature generation, the combined embedding transfer net and the end-to-end joint training process all contribute to alleviating projection domain shift in zero-shot learning.

Zero-Shot Image Classification with Rectified Embedding Vectors Using a Caption Generator

Zero-Shot Learning with Generative Latent Prototype Model.

Joint Learning of Attended Zero-Shot Features and Visual-Semantic Mapping.

Image-Caption Encoding for Improving Zero-Shot Generalization

Manifold Embedding for Zero-Shot Recognition

Zero-Shot Recognition Based on Semantic Embeddings and Deep Clustering

Zero-shot image classification via Visual–Semantic Feature Decoupling

Zero-Shot Recognition Using Dual Visual-Semantic Mapping Paths.

Deep Unbiased Embedding Transfer for Zero-shot Learning

Text2Model: Text-based Model Induction for Zero-shot Image Classification

Learning Deep Representations of Fine-Grained Visual Descriptions

A Simple Framework for Open-Vocabulary Zero-Shot Segmentation

Zero-Shot Learning with Joint Generative Adversarial Networks

VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning

Learning a Deep Embedding Model for Zero-Shot Learning

Expanding Semantic Knowledge for Zero-Shot Graph Embedding

Caption Generation on Scenes with Seen and Unseen Object Categories

Zero-Shot Object Detection by Hybrid Region Embedding

Cap2Seg: Inferring Semantic and Spatial Context from Captions for Zero-Shot Image Segmentation

Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks

Indirect visual–semantic alignment for generalized zero-shot recognition