Learning Visually Aligned Semantic Graph for Cross-Modal Manifold Matching.

Yanan Li,Huanhang Hu,Donghui Wang
DOI: https://doi.org/10.1109/icip.2019.8803515
2019-01-01
Abstract:Many cross-modal learning tasks, such as reasoning and matching between vision and language, usually need to explicitly learn the mapping function between semantic expressions of different modalities, whose manifold structures are inconsistent. We proposed to solve above cross-modal learning tasks from the perspective of semantic manifold alignment. First, we extract the respective intrinsic manifolds from different modal spaces and express them as semantic graphs. Then, we use a matrix decomposition strategy to learn a visually aligned semantic graph, based on which a nonparametric graph inference method is proposed. It is finally applied to a typical cross-modal learning task, i.e. zero-shot learning (ZSL). Extensive experimental results demonstrate its effectiveness and promising results.
What problem does this paper attempt to address?