Zero-Shot Recognition Based on Semantic Embeddings and Deep Clustering

Zhaohui Liu,Jianjun Tan,Bingli Jiao
DOI: https://doi.org/10.1145/3456415.3457221
2021-01-01
Abstract:This work introduces a model that can discover new objects in images even if no training data is available for the object class. Different with zero-shot learning models, which needs text corpora about the new objects, our model can operate on a mixture of seen and unseen classes with no extra information, simultaneously obtaining reasonable performance on unseen classes. This achieved by seeing the distributions of words in texts as a semantic space for understanding what objects look like. Our deep learning model can be adapted to the presence or absence of semantic or visual feature vectors annotated manually, and does not need text corpus other than the semantic space obtained by training set. Images are mapped to be close to semantic word vector corresponding to their classes in training set and to be clustered. The resulting image embeddings can be used to distinguish whether an image is of a seen or unseen class. We train a novelty neural network structure here to discover new objects, which consider two aspects, i.e., the accuracy of mapping in semantic space and the fitness for embeddings’ clustering.
What problem does this paper attempt to address?