Metric Based On Multi-Order Spaces For Cross-Modal Retrieval

Liang Zhang,Bingpeng Ma,Guorong Li,Qingming Huang
DOI: https://doi.org/10.1109/ICME.2017.8019409
2017-01-01
Abstract:This paper proposes a novel method for cross-modal retrieval. Different from vector (text)-to-vector (image) framework of the traditional cross-modal methods, we adopt a vector (text)to- matrix (image) framework. We assume that compared with vectors, matrices can directly represent images and characterize the structure of feature space. Furthermore, we propose a Metric based on Multi-order spaces (MMs). Multi-order statistic features are used to represent images for enriching the semantic information, and metrics among the multi-spaces are jointly learned to measure the similarity between two different modalities. Specifically, there are three steps for MMs. First, we jointly use the bags of visual features (zero-order), mean (first-order) and covariance (second-order) to characterize each image. Second, considering that covariance matrices and vectors lie on a Riemannian manifold and an Euclidean space respectively, we embed multi-order spaces into their corresponding Hilbert spaces to reduce the heterogeneity among the original spaces. Finally, the similarity between two different modalities can be measured by learning multiple transformations from the different Hilbert spaces to a common subspace. The performance of the proposed method over the state-of-the-art has been demonstrated through the experiments on two public datasets.
What problem does this paper attempt to address?