Efficient Fine-Grained Visual-Text Search Using Adversarially-Learned Hash Codes

Yongzhi Li,Yadong Mu,Nan Zhuang,Xianglong Liu
DOI: https://doi.org/10.1109/icme51207.2021.9428271
2021-01-01
Abstract:Cross-modal hashing for efficient visual-text search has attracted much research enthusiasm in recent years. The main argument of this work is that existing hashing methods mainly exploit a multi-label matching paradigm, ignoring various fine-grained semantics (high-order relationships, object attributes, etc.) in the multi-modal data. This paper explores cross-modal hashing from two rarely-explored aspects: first, we propose an efficient two-step hashing scheme that quickly screens irrelevant samples with global feature and then generate fine-grained feature guided by high-order concepts to re-rank the survived candidates. Secondly, the robustness of the cross-modal hashing model, particularly under subtle tampering of fine-grained queries, is formally investigated. We propose a rephrase and adversarial training strategy for obtaining better performance and robustness. Comprehensive experiments and ablation studies on two large public datasets (MS-COCO and Flickr30K) demonstrate the proposed method’s superiority in terms of both efficiency and accuracy.
What problem does this paper attempt to address?