Learning Visual-and-semantic Knowledge Embedding for Zero-Shot Image Classification

Kong Dehui,Li Xiliang,Wang Shaofan,Li Jinghua,Yin Baocai
DOI: https://doi.org/10.1007/s10489-022-03443-1
IF: 5.3
2022-01-01
Applied Intelligence
Abstract:Recently, several classifier prediction methods have emerged exploiting knowledge graphs and Graph Convolutional Neural Network (GCN), achieving excellent results in the field of Zero-Shot Learning (ZSL). However, existing methods only rely on pre-trained seen class classifier parameters to guide the model’s training, prohibiting the discriminative visual features from being mined and not guaranteeing the effective use of semantic features. Therefore, this work presents a novel Knowledge-Assisted ZSL Model (KAZSLM), which improves the classification ability by embedding visual information and semantic information into the classifier space. In this work, GCN classifier prediction network promoted by word embedding and inter-class relationships is employed as the Basic Framework (BF), which is then combined with a Visual Knowledge Assistant (VKA) module and a Semantic Knowledge Assistant (SKA) module to form KAZSLM. In the VKA module, the average visual feature of all the samples in each seen class and its corresponding class label are used to guide the model to refine the classifier at a lower computational cost. Regarding the SKA module, the samples’ semantic features per class are applied to refine the classifier through a GCN with a loss function related to reconstruct each classes’ semantic features from the corresponding classifier parameters. These two assistant modules allow visual knowledge and semantic knowledge to force the whole model to acquire more precise classifier. Moreover, a simple convolutional residual network is taken to further reinforce the performance of the model on the AWA2 dataset. Experimental results on the AWA2 and ImageNet datasets demonstrate that KAZSLM achieves a better image classification performance than current ZSL classification methods.
What problem does this paper attempt to address?