Improve Deep Hashing with Language Guidance for Unsupervised Image Retrieval

Chuang Zhao,Hefei Ling,Shijie Lu,Yuxuan Shi,Jiazhong Chen,Ping Li
DOI: https://doi.org/10.1145/3652583.3658059
2024-01-01
Abstract:Hashing method is widely used in multimedia retrieval systems because of its outstanding retrieval efficiency and low storage cost. Most existing unsupervised hashing methods learn binary hash codes through similarity structure preserving or contrastive learning of hash codes. However, these methods usually use the visual similarity of images to guide hash learning, which does not fully utilize the high-level semantic concept information contained in images, resulting in limited retrieval performance. To tackle this problem, we propose a novel deep unsupervised hashing method called Language Guidance Hashing (LGH). Specifically, LGH utilizes a language model to mine high-level semantic concept information in images and construct a language-based similarity structure, which is used to guide hash learning. By introducing features of textual modality, higher information gain can be brought. In addition, we also propose a language-guided contrastive learning method for learning high-quality binary hash codes. Extensive experimental results show that LGH significantly outperforms state-of-the-art unsupervised hashing methods on three benchmark image datasets.
What problem does this paper attempt to address?