Unsupervised Dual Hashing Coding (UDC) on Semantic Tagging and Sample Content for Cross-modal Retrieval

Hongmin Cai,Bin Zhang,Junyu Li,Bin Hu,Jiazhou Chen
DOI: https://doi.org/10.1109/tmm.2024.3385986
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Current cross-modal retrieval methods heavily rely on accurate semantic labels or sample similarity measurements, and need to search for the nearest samples among all samples in the huge search space, severely limiting the application in stratifying large-scale and high-dimensional multimodal data. To tackle with the issues, this paper proposes an unsupervised cross-modal retrieval method to bypass the semanticwise supervision and samplewise similarity from a standpoint of featurewise matching, named by unsupervised dual hashing coding (UDC). It jointly learns the dual hashing codes on semantic tagging and sample content through factorizing a feature matching potential, which is allowed to bridge the semantic and heterogeneous gaps among different modalities simultaneously through maintaining the inter-modality-consistent semantic information and cross-modality-correlated sample content. In this way, each sample is uniquely coded by a head code on semanticwise tags, and tail codes on samplewise content. The dual coding design makes it very efficient for sample retrieval, in which the query sample only need to search for the retrieved ones with the same semantic tag, greatly narrowing down the search space. The proposed model avoids the calculation of massive sample-wise similarity and works with dual hashing coding scheme, which achieves a twofold efficiency enhancement for analyzing the large-scale and high-dimensional multimodal data. Extensive experiments have been conducted to demonstrate that it achieved superiority on computational time and retrieval performance.
What problem does this paper attempt to address?