Multimedia Retrieval by Deep Hashing with Multilevel Similarity Learning.

Qiuli Liu,Lu Jin,Zechao Li,Jinhui Tang
DOI: https://doi.org/10.1016/j.jvcir.2018.11.011
IF: 2.887
2019-01-01
Journal of Visual Communication and Image Representation
Abstract:Deep multimodal hashing has received increasing research attention in recent years due to its superior performance for large-scale multimedia retrieval. However, limited e orts have been made to explore the complex multilevel semantic structure for deep multimodal hashing. In this paper, we propose a novel deep multimodal hashing method, termed as Deep Hashing with Multilevel Similarity Learning (DHMSL), for learning compact and discriminative hash codes, which explores multilevel semantic similarity correlations of multimedia data. In DHMSL, multilevel similarity correlation is explored to learn the unified binary hash codes by exploiting the local structure and semantic label information simultaneously. Meanwhile, the bit balance and quantization constraints are taken into account to further make the unified hash codes compact. With the unified binary codes learned, two deep neural networks are jointly trained to simultaneously learn feature representations and two sets of nonlinear hash functions. Specifically, the well-designed loss functions are introduced to minimize the prediction errors of the feature representations as well as the errors between the unified binary codes and outputs of the networks. Extensive experiments on two widely-used multimodal datasets demonstrate that the proposed method can achieve the state-of-the-art performance for both image-query-text and text-query-image tasks.
What problem does this paper attempt to address?