Multiple Information Embedded Hashing for Large-Scale Cross-Modal Retrieval
Yongxin Wang,Yu-Wei Zhan,Zhen-Duo Chen,Xin Luo,Xin-Shun Xu
DOI: https://doi.org/10.1109/tcsvt.2023.3340102
IF: 5.859
2023-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Recently, many efforts have been devoted to improving the retrieval performance of supervised cross-modal hashing; however, current methods are gradually reaching a performance bottleneck, especially when dealing with real-world multimedia data. This is mainly due to their application of coarse-grained semantics, unrobust hash functions, and inflexible workflows. Therefore, discovering refined semantics hidden in data, designing robust hash functions, and creating a non-interfering but facilitative learning workflow are much more significant. With this motivation, in this paper, we propose a novel supervised cross-modal hashing method, i.e., Multiple Information Embedded Hashing, MIEH for short. It consists of a three-step working flow that flexibly handles multiple information mining, hash code learning, and hash function learning. First, it explores the multimedia data from multiple perspectives such as modal-level consistency, class-level discriminability, and instance-level similarity to mine comprehensive semantic information, which not only contributes to the generation of discriminative hash codes, but also accelerates convergence. Subsequently, MIEH is committed to embed the refined semantics into targeted hash codes with an efficient discrete optimization algorithm. Finally, it improves the learning ability of linear hash function by noisy example erasing and deviation correcting. Considering this, MIEH is able to garner more robust hash function. Extensive experiments conducted on three popular benchmark datasets highlight the superiority of our MIEH on large-scale cross-modal retrieval tasks and demonstrate its competitive performance against state-of-the-art approaches. The source code is available 1 .