Deep Hierarchy-aware Proxy Hashing with Self-paced Learning for Cross-modal Retrieval

Yadong Huo,Qibing Qin,Wenfeng Zhang,Lei Huang,Jie Nie
DOI: https://doi.org/10.1109/tkde.2024.3401050
2024-01-01
Abstract:Due to its low storage cost and high retrieval efficiency, hashing technology is popularly applied in both academia and industry, which provides an interesting solution for cross-modal similarity retrieval. However, most existing supervised cross-modal hashing methods typically view the fixed-level semantic affinity defined by manual labels as supervised signals to guide hash learning, which only represents a small subset of complex semantic relations between multi-modal samples, thus impeding the hash function learning and degrading the obtained hash codes. In the paper, by learning shared hierarchy proxies, a novel deep cross-modal hashing framework, called Deep Hierarchy-aware Proxy Hashing (DHaPH), is proposed to construct the semantic hierarchy in a data-driven manner, thereby capturing the accurate fine-grained semantic relationships and achieving small intra-class scatter and big inter-class scatter. Specifically, by regarding the hierarchical proxies as learnable ancestors, a novel hierarchy-aware proxy loss is designed to model the latent semantic hierarchical structures from different modalities without prior hierarchy knowledge, in which similar samples share the same Lowest Common Ancestor (LCA) and dissimilar points have different LCA. Meanwhile, to adequately capture valuable semantic information from hard pairs, a multi-modal self-paced loss is introduced into cross-modal hashing to reweight multi-modal pairs dynamically, which enables the model to gradually focus on hard pairs while simultaneously learning universal patterns from multi-modal pairs. Extensive experiments on three available benchmark databases demonstrate that our proposed DHaPH framework outperforms the compared baselines with different evaluation metrics. The corresponding code is available at https://github.com/QinLab-WFU/DHaPH .
What problem does this paper attempt to address?