HEART: Towards Effective Hash Codes under Label Noise

Jinan Sun,Haixin Wang,Xiao Luo,Shikun Zhang,Wei Xiang,Chong Chen,Xian-Sheng Hua
DOI: https://doi.org/10.1145/3503161.3548127
2022-01-01
Abstract:Hashing, which encodes raw data into compact binary codes, has grown in popularity for large-scale image retrieval due to its storage and computation efficiency. Although deep supervised hashing has lately shown promising performance, they mostly assume that the semantic labels of training data are ideally noise-free, which is often unrealistic in real-world applications. In this paper, considering the practical application, we focus on the problem of learning to hash with label noise and propose a novel method called HEART to address the problem. HEART is a holistic framework which explores latent semantic distributions to select both clean samples and pairs of high confidence for mitigating the impacts of label noise. From a statistical perspective, our HEART characterizes each image by its multiple augmented views that can be considered as examples from its latent distribution and then calculates semantic distances between images using energy distances between their latent distributions. With semantic distances, we can select confident similar pairs to guide hashing contrastive learning for high-quality hash codes. Moreover, to prevent the memorization of noisy examples, we propose a novel strategy to identify clean samples which have small variations of losses on the latent distributions and train the network on clean samples using a pointwise loss. Experimental results on several popular benchmark datasets demonstrate the effectiveness of our HEART compared with a wide range of baselines.
What problem does this paper attempt to address?