Semantic Enhanced Cross-modal GAN for Zero-shot Learning

Haotian Sun,Jiwei Wei,Yang,Xing Xu
DOI: https://doi.org/10.1145/3469877.3490581
2021-01-01
Abstract:The goal of Zero-shot Learning (ZSL) is to recognize categories that are not seen during the training process. The traditional method is to learn an embedding space and map visual features and semantic features to this common space. However, this method inevitably encounters the bias problem, i.e., unseen instances are often incorrectly recognized as the seen classes. Some attempts are made by proposing another paradigm, which uses generative models to hallucinate the features of unseen samples. However, the generative models often suffer from instability issues, making it impractical for them to generate fine-grained features of unseen samples, thus resulting in very limited improvement. To resolve this, a Semantic Enhanced Cross-modal GAN (SECM GAN) is proposed by imposing the cross-modal association for improving the semantic and discriminative property of the generated features. Specifically, we first train a cross-modal embedding model called Semantic Enhanced Cross-modal Model (SECM), which is constrained by discrimination and semantics. Then we train our generative model based on Generative Adversarial Network (GAN) called SECM GAN, in which the generator generates cross-modal features, and the discriminator distinguishes true cross-modal features from generated cross-modal features. We deploy SECM as a weak constraint of GAN, which makes reliance on GAN get reduced. We evaluate extensive experiments on three widely used ZSL datasets to demonstrate the superiority of our framework.
What problem does this paper attempt to address?