Dual graph-structured semantics multi-subspace learning for cross-modal retrieval

Yirong Li,Xianghong Tang,Jianguang Lu,Yong Huang
DOI: https://doi.org/10.1007/s00530-024-01471-0
IF: 3.9
2024-09-28
Multimedia Systems
Abstract:As the era of big data develops rapidly, cross-modal retrieval is a research field that has received widespread attention. Most current methods of cross-modal retrieval just pursue the macro alignment of modal data in a shared space to gain a common representation. However, they cannot achieve the satisfactory performance of cross-modal retrieval since they neglect the deep semantic alignment and the inherent differences between modalities. Being aware of these, this paper presents a dual graph-structured semantics multi-subspace learning (DGMS) method for cross-modal retrieval. Specifically in DGMS, the double semantics graph is established to represent the deep semantics of modal data, and the multiple subspace learning network constructs public and independent subspaces to capture the relevance and dissimilarity of modal data. Finally, a dual learning method based on the generative adversarial network is employed further to catch the joint probability distribution of the different modalities. The superiority of DGMS is demonstrated by experiments on Wikipedia, XMedia, and Pascal Sentence, for it can not only learn deep structural semantics but also explore the consistency and diversity of modalities.
computer science, information systems, theory & methods
What problem does this paper attempt to address?