Cross-modal Metric Learning with Graph Embedding.

Youcai Zhang,Xiaodong Gu
DOI: https://doi.org/10.1109/ijcnn.2018.8489110
2018-01-01
Abstract:Metric learning with neural networks has exhibited promising improvements in representation learning. Yet cross-modal retrieval poses a unique challenge to metric learning: how to compute the distance across different modalities such as image and text. Existing neural network based methods tend to establish two branches for images and texts respectively to bridge the modal gap. Also, most of them cannot fully exploit the structure embedded in the multimodal data. This paper introduces embedding layer to provide cross-modal shared representation with non-linearity and reformulates the cross-modal retrieval problem as a graph embedding problem by constructing a multimodal graph. To learn the graph embedding, training pairs and triplets are uniformly generated from random walk sequences on the graph. Then graph pair and triplet constraints are imposed on the embedding layer for structure preservation. Meanwhile, a classifier is trained with labeled data to ensure the learned embedding is coupled with semantic information. For optimization, graph pair and triplet constraints are integrated into a unified multi-task learning with the supervised classifier. Experimental results on the Wiki and NUS-WIDE datasets demonstrate the effectiveness and superiority of the learned embedding for cross-modal retrieval.
What problem does this paper attempt to address?