Self-supervised Learning-Based Weight Adaptive Hashing for Fast Cross-Modal Retrieval

Li Yifan,Wang Xuan,Qi Shuhan,Huang Chengkai,Jiang Zoe. L,Liao Qing,Guan Jian,Zhang Jiajia
DOI: https://doi.org/10.1007/s11760-019-01534-0
IF: 1.583
2019-01-01
Signal Image and Video Processing
Abstract:Due to the low storage cost and fast search speed, hashing is widely used in cross-modal retrieval. However, there still remain some crucial bottlenecks: Firstly, there are not suitable big datasets for multimodal data. Secondly, imbalance instances will affect the accuracy of the retrieval system. In this paper, we propose an end-to-end self-supervised learning-based weight adaptive hashing method for cross-modal retrieval. For the restriction of datasets, we use the self-supervised fashion to directly extract fine-grained features from labels and use them to supervise the hashing learning of other modalities. To overcome the problem of imbalance instances, we design an adaptive weight loss to flexibly adjust the weight of training samples according to their proportions. Besides these, we also use a binary approximation regularization term to reduce the regularization error. Experiments on MIRFLICKR-25K and NUS-WIDE datasets show that our method can improve 3% performance compared to other methods.
What problem does this paper attempt to address?