Deep Saliency Hashing

Sheng Jin,Hongxun Yao,Xiaoshuai Sun,Shangchen Zhou,Lei Zhang,Xiansheng Hua
DOI: https://doi.org/10.48550/arXiv.1807.01459
2019-02-01
Abstract:In recent years, hashing methods have been proved to be effective and efficient for the large-scale Web media search. However, the existing general hashing methods have limited discriminative power for describing fine-grained objects that share similar overall appearance but have subtle difference. To solve this problem, we for the first time introduce the attention mechanism to the learning of fine-grained hashing codes. Specifically, we propose a novel deep hashing model, named deep saliency hashing (DSaH), which automatically mines salient regions and learns semantic-preserving hashing codes simultaneously. DSaH is a two-step end-to-end model consisting of an attention network and a hashing network. Our loss function contains three basic components, including the semantic loss, the saliency loss, and the quantization loss. As the core of DSaH, the saliency loss guides the attention network to mine discriminative regions from pairs of images. We conduct extensive experiments on both fine-grained and general retrieval datasets for performance evaluation. Experimental results on fine-grained datasets, including Oxford Flowers-17, Stanford Dogs-120, and CUB Bird demonstrate that our DSaH performs the best for fine-grained retrieval task and beats the strongest competitor (DTQ) by approximately 10% on both Stanford Dogs-120 and CUB Bird. DSaH is also comparable to several state-of-the-art hashing methods on general datasets, including CIFAR-10 and NUS-WIDE.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in large - scale image retrieval tasks, the discriminative ability of existing general - purpose hashing methods in describing fine - grained objects is limited. Fine - grained objects refer to those objects with similar appearances but subtle differences. To overcome this challenge, for the first time, the author introduced the attention mechanism into the learning of fine - grained hash codes and proposed a new model named Deep Saliency Hashing (DSaH). The DSaH model can automatically mine salient regions and simultaneously learn semantically - preserved hash codes. Specifically, DSaH is a two - step end - to - end model, consisting of an attention network and a hash network. The loss function contains three basic components: semantic loss, saliency loss and quantization loss. Among them, the saliency loss guides the attention network to mine discriminative regions from image pairs. Verified by extensive experiments, DSaH outperforms the existing strongest competitor (DTQ) on fine - grained retrieval datasets and is comparable to several state - of - the - art hashing methods on general - purpose datasets.