Deep Neighborhood-preserving Hashing with Quadratic Spherical Mutual Information for Cross-modal Retrieval
Qibing Qin,Yadong Huo,Lei Huang,Jiangyan Dai,Huihui Zhang,Wenfeng Zhang
DOI: https://doi.org/10.1109/tmm.2023.3349075
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Driven by the high nonlinearity of deep neural networks, deep hashing has achieved the pictured great potential in cross-modal retrieval applications, significantly bridging the modality gap. Current deep cross-modal hashing usually utilizes affinity matching or local ranking to capture the local semantic relationships in the learned common space, leading to high neighborhood ambiguity. Simultaneously, most of these frameworks utilize additional regularization terms or margin thresholds to enhance the overall performance, in which searching the model's hyper-parameters under mass training data would have a substantial overhead. In this paper, with a novel extension of information-theoretic measures, a novel deep cross-modal hashing method, named Deep Neighborhood-preserving Hashing (DNpH), is designed to learn a highly separable discrete space, effectively mitigating the semantic gap across different modalities. Specifically, to minimize neighborhood ambiguity, the Quadratic Spherical Mutual Information (QSMI) is first introduced into deep cross-modal hashing to separate neighbors and non-neighbors well, while it is free of tuning parameters during model training compared with other similarity measures. To optimize quadratic mutual information loss smoothly, a square clamping method is developed to improve the stability of model optimization, avoiding converging on bad local optimum. Besides, two transformer encoders are exploited as feature extractors for multi-modal samples to learn the informative semantic representations. Finally, we compare our proposed DNpH framework with various state-of-the-art cross-modal hashing on four public datasets, and large amounts of experiment results demonstrate our contributions and show that DNpH outperforms the compared baselines on different evaluation metrics. The corresponding code is available at https://github.com/QinLab-WFU/DNpH.