Dark knowledge association guided hashing for unsupervised cross-modal retrieval

Han Kang,Xiaowei Zhang,Wenpeng Han,Mingliang Zhou
DOI: https://doi.org/10.1007/s00530-024-01539-x
IF: 3.9
2024-12-03
Multimedia Systems
Abstract:Unsupervised cross-modal hashing has attracted much attention in large-scale cross-modal retrieval due to its low storage consumption and high retrieval efficiency. However, existing unsupervised hashing methods fail to capture the relevance of implicit knowledge in cross-modal large models (e.g.CLIP), which leads to an incomplete representation of the semantic information of the hashing codes. To solve this problem, we introduce in this paper a new approach called Dark Knowledge Association Guided Hashing (DKAGH) for unsupervised cross-modal retrieval. Specifically, we propose a new cross-modal interaction attention module to enhance heterogeneous semantic interactions while extracting rich implicit information in CLIP models via a similarity distillation module to optimise cross-modal similarity relations. We then propose a concept-aware semantic hashing module which designs concept-aware encoders to decouple the multimodal features for capturing implicit concept representation and explore contrast loss on concept-aware hashing codes to align the heterogeneous modalities for multimodal hash learning. Extensive experiments on three cross-modal retrieval datasets demonstrate that DKAGH achieves the state-of-the-art performance.
computer science, information systems, theory & methods
What problem does this paper attempt to address?