Multi-Graph Based Hierarchical Semantic Fusion for Cross-Modal Representation

Lei Zhu,Chengyuan Zhang,Jiayu Song,Liangchen Liu,Shichao Zhang,Yangding Li
DOI: https://doi.org/10.1109/icme51207.2021.9428194
2021-01-01
Abstract:The main challenge of cross-modal retrieval is how to efficiently realize semantic alignment and reduce the heterogeneity gap. However, existing approaches ignore the multi-grained semantic knowledge learning from different modalities. To this end, this paper proposes a novel end-to-end cross-modal representation method, termed as Multi-Graph based Hierarchical Semantic Fusion (MG-HSF). This method is an integration of multi-graph hierarchical semantic fusion with cross-modal adversarial learning, which captures fine-grained and coarse-grained semantic knowledge from cross-modal samples, and generate modalities-invariant representations in a common subspace. To evaluate the performance, extensive experiments are conducted on three benchmarks. The experimental results show that our method is superior than the state-of-the-arts.
What problem does this paper attempt to address?