Deep Self-Enhancement Hashing for Robust Multi-Label Cross-Modal Retrieval

Ge Song,Hanwen Su,Kai Huang,Fengyi Song,Ming Yang
DOI: https://doi.org/10.1016/j.patcog.2023.110079
IF: 8
2024-01-01
Pattern Recognition
Abstract:The goal of cross-modal hashing is to map data from several modalities into a compact Hamming space for efficient and accurate retrieval. Despite the satisfactory performance, existing approaches are reliant on the closed-world assumption. When confronted with real-world retrieval tasks involving out-of-distribution (OOD) semantic data, the similarity relationships of known data retained in hash codes tend to be disrupted by these unknown ones, resulting in retrieval performance degradation. To this end, we present a deep self-enhancing hashing (DSEH) method, simultaneously learning multi-level similarity-preserved hash codes of the known multi-label cross-modal data and robustness to OOD instances. Specifically, we propose to construct pseudo-OOD samples in the feature space using random linear combinations to explore OOD semantics, during the training process. Meanwhile, a prototype-based generative model is incorporated to aggregate batch data to enhance the data representation’s differences in known and unknown semantics. Furthermore, we describe a bounded cosine quadrupled loss with distance bound to preserve the multi-level similarity of multi-label data and control the maximum distance between known data and the minimum distance between known and pseudo-OOD data for learning OOD robustness. Extensive experiments show that the DSEH achieves state-of-the-art performance on closed-world tasks and good performance on simulated real-world tasks.
What problem does this paper attempt to address?