Multi-level Fusion of Multi-modal Semantic Embeddings for Zero Shot Learning

Yuhan Liu,Zhe Kong,Neng Gao,Chenyang Tu,Xin Eric Wang,Yifei Zhang
DOI: https://doi.org/10.1145/3536221.3556575
2022-11-07
Abstract:Zero shot learning aims to recognize objects whose instances may not be covered by the training data. To generalize knowledge from seen classes to the novel ones, semantic space is built to embed knowledge from various views into multi-modal semantic embeddings. Existing semantic embeddings neglect the relationships between classes which are essential to transfer knowledge between classes. Moreover, existing zero shot learning models ignore the complementarity between semantic embeddings from different modalities. To tackle these problems, in this work, we resort to graph theory to explicitly model the interdependence between classes and then obtain new modal semantic embeddings. Furthermore, we pioneer to propose a multi-level fusion model to effectively combine knowledge encoded in multi-modal semantic embeddings together. By the virtue of subsequent fusion block, the results of multi-level fusion can be furtherly enriched and fused. Experiments show that our model could achieve promising results on various datasets. Ablation study suggests that our method is well suited for zero shot learning.
Computer Science
What problem does this paper attempt to address?