Knowledge Graph Enhanced Multimodal Learning for Few-shot Visual Recognition

Mengya Han,Yibing Zhan,Baosheng Yu,Yong Luo,Bo Du,Dacheng Tao
DOI: https://doi.org/10.1109/mmsp55362.2022.9948891
2022-01-01
Abstract:Few-shot learning (FSL) aims to learn a classifier for novel classes with only a few labeled samples per category available. The mainstream FSL approaches fall in the meta-learning paradigm, where a meta-learner is used to learn transferable knowledge and generalize to new tasks. However, these approaches usually only leverage information from a single modality (e.g., visual image) and fail to explore the information from other modalities (e.g., the knowledge graph). Since the labeled samples are scarce in FSL, increasing the information for each example is a possible solution to improve the performance. This motivates us to develop a new meta-learning framework for few-shot visual recognition termed Knowledge Graph enhanced FSL (KGFSL), which combines the information from multiple modalities: 1) the visual information in images and 2) the rich semantics and structural information in a knowledge graph (KG). Specifically, KGFSL exploits the word embedding of the category and its relationship to other categories to improve the visual-based models. A graph convolutional network (GCN) is first introduced to learn the semantic embeddings for each node (a visual category) in KG. The visual and semantic embeddings are then aligned and combined for final prediction. Finally, the whole framework is trained in an end-to-end manner. We conduct extensive experiments on two widely-used FSL benchmarks: miniImageNet and tieredImageNet. Experimental results demonstrate the effectiveness of the multimodal information for few-shot learning, and our proposed method can significantly outperform the state-of-the-art approaches.
What problem does this paper attempt to address?