Cross-Modal Learning to Rank Via Latent Joint Representation

Fei Wu,Xinyang Jiang,Xi Li,Siliang Tang,Weiming Lu,Zhongfei Zhang,Yueting Zhuang
DOI: https://doi.org/10.1109/tip.2015.2403240
IF: 10.6
2015-01-01
IEEE Transactions on Image Processing
Abstract:Cross-modal ranking is a research topic that is imperative to many applications involving multimodal data. Discovering a joint representation for multimodal data and learning a ranking function are essential in order to boost the cross-media retrieval (i.e., image-query-text or text-query-image). In this paper, we propose an approach to discover the latent joint representation of pairs of multimodal data (e.g., pairs of an image query and a text document) via a conditional random field and structural learning in a listwise ranking manner. We call this approach cross-modal learning to rank via latent joint representation (CML2R). In CML2R, the correlations between multimodal data are captured in terms of their sharing hidden variables (e.g., topics), and a hidden-topic-driven discriminative ranking function is learned in a listwise ranking manner. The experiments show that the proposed approach achieves a good performance in cross-media retrieval and meanwhile has the capability to learn the discriminative representation of multimodal data.
What problem does this paper attempt to address?