Text-assisted attention-based cross-modal hashing

Xiang Yuan,Shihao Shan,Yuwen Huo,Junkai Jiang,Song Wu
DOI: https://doi.org/10.1007/s13735-023-00311-7
2024-01-10
International Journal of Multimedia Information Retrieval
Abstract:As one of the hottest research topics in multimedia information retrieval, cross-modal hashing has drawn widespread attention in the past decades. How to minimize the semantic gap of heterogeneous data and accurately calculate the similarity of cross-modal data is a key challenge for this task. A paradigm for tackling this problem is to map features of multi-modal data into common space. However, these approaches lack inter-modal information interaction and may not achieve satisfactory results. To overcome this problem, we propose a novel text-assisted attention-based cross-modal hashing (TAACH) method in this paper. Firstly, TAACH relies on LabelNet supervision to guide the learning of hash functions for each modality. In addition, a novel text-assisted attention mechanism is designed in our TAACH to densely integrate text features into image features, perceiving their spatial correlation and enhancing the consistency of image and text knowledge. Extensive experiments on four benchmark datasets show the effectiveness of our proposed TAACH, and it also achieves competitive performance compared to state-of-the-art methods. The source code is available at https://github.com/SWU-CS-MediaLab/TAACH.
computer science, artificial intelligence, software engineering
What problem does this paper attempt to address?