Zero-Shot Image Classification with Rectified Embedding Vectors Using a Caption Generator

Chan Hur,Hyeyoung Park
DOI: https://doi.org/10.2139/ssrn.4061391
2022-01-01
SSRN Electronic Journal
Abstract:Although image recognition technologies are developing rapidly with deep learning, conventional recognition models do not work well when test classes are not included in the classes of training set. To overcome this limitation, zero-shot learning has been studied recently. The zero-shot image classifier cannot get any visual information of test classes during learning, it needs to match the feature of test image with a conceptual representation of the test class. The joint embedding method has been proposed as a solution, but it suffers from the inconsistency between the distribution of two feature sets extracted from the heterogeneous inputs. To treat this problem, we propose a novel method of employing additional textual information to rectify the visual representation of input images. Since the conceptual information of test classes are generally given as texts, we expect that the additional descriptions from a caption generator can adjust the visual feature for better matching with the representation of the test classes. We also propose to use the generated textual descriptions to augment training samples for learning of joint embedding space. In the experiments on two benchmark datasets, the proposed method shows notable improvement compared to the existing models.
What problem does this paper attempt to address?