Deep Pairwise Ranking with Multi-label Information for Cross-Modal Retrieval.

Yangwo Jian,Jing Xiao,Yang Cao,Asad Khan,Jia Zhu
DOI: https://doi.org/10.1109/icme.2019.00311
2019-01-01
Abstract:Cross-modal retrieval has gained much attention due to the growing demand for enormous multi-modal data in recent years (i.e., image-text or text-image retrieval). In order to alleviate the problem of ignoring the existence of irrelevant information between images and texts, this paper proposes Deep Pairwise Ranking model with multi-label information for Cross-Modal retrieval (DPRCM). DPRCM directly learns a mapping from images and texts to a compact Euclidean space where distances correspond to the similarity measure of images and texts. The bi-triplet loss function in DPRCM reduces the distance between associated images and texts on the common subspace and increases the margin of independent samples. The classification loss function can better utilize the multi-label information to reduce the semantic gap between image features and text descriptions. Experiments on three widely-used datasets show that DPRCM can achieve competitive performance compared to state-of-the-art methods.
What problem does this paper attempt to address?