Deep noise mitigation and semantic reconstruction hashing for unsupervised cross-modal retrieval

Cheng Zhang,Yuan Wan,Haopeng Qiang
DOI: https://doi.org/10.1007/s00521-023-09331-0
2024-01-03
Neural Computing and Applications
Abstract:Cross-modal hashing has attracted much attention due to low storage cost and high retrieval efficiency. Compared with the supervised counterparts, the unsupervised cross-modal hashing methods suffer from severe performance degradation without label guidance. Pseudo label-based unsupervised methods have been proved to be an effective way to improve the discriminative ability of hash codes. However, there are varies of noises during the process of creating pseudo labels by clustering algorithms. To mitigate the effects of noise, in this paper, we propose a novel deep noise mitigation and semantic reconstruction hashing (DNMSRH) for unsupervised cross-modal retrieval. Specifically, an unsupervised data balancing strategy is used to search the equivalent training data in each cluster satisfying the distribution of the minimum variance within the class and the maximum variance between classes, which effectively mitigates the data noise caused by the misclassification of outliers. Meanwhile, a joint symmetric multi-metric similarity reconstruction framework is constructed, which cannot only joint the semantic information of heterogeneous modalities, but also preserve and extend the pairwise instance correlation of original features. Furthermore, offline hard and online soft pseudo labels are introduced to mitigate the effects of noisy labels, where soft pseudo labels are generated by the collaborative training of heterogeneous image and text networks. Extensive experiments on three benchmark datasets for unsupervised cross-modal retrieval demonstrate that DNMSRH significantly outperforms the state-of-the-art competitors.
computer science, artificial intelligence
What problem does this paper attempt to address?