A Hash Centroid Construction Method with Swin Transformer for Multi-Label Image Retrieval.

Yanzhao Xie,Yangtao Wang,Rukai Wei,Yu Liu,Ke Zhou,Lisheng Fan
DOI: https://doi.org/10.1007/s00521-023-08273-x
2023-01-01
Neural Computing and Applications
Abstract:Quantization-based hashing methods have become increasingly popular to adjust the global data distribution and accurately capture the data similarity compared with pairwise/triplet similarity-based methods. However, the existing image quantization hashing approaches adopt fixed hash centers, which consider neither the semantic information of each hash center nor the scale size of each object appearing in a multi-label image, resulting in that each hash code will deviate from its corresponding hash centroid. To address this issue, we propose HCCST, a hash centroid construction method with Swin transformer for multi-label image retrieval. HCCST consists of a hash code generation module, a hash centroid construction module and an interaction module between each hash code and its corresponding hash centroid. In the hash code generation module, we first adopt Swin transformer to extract the feature vector for each input multi-label image and then generate the initialized hash code of this image. In the hash centroid construction module, we first utilize the object semantic information to construct semantic hash centers and then consider the object scale size by learning the object weight coefficient to compute the hash centroid for each sample. After obtaining both the hash code and hash centroid of each sample, in the last interaction module, we constantly limit the distance between each hash code and its hash centroid to preserve the similarity between samples. Our model will be trained in an end-to-end manner to alternately update the net parameters of hash code generation module, hash centroid construction module and the object weight coefficient. We conduct extensive experiments on 3 multi-label image datasets including VOC2012, MS-COCO and NUS-WIDE. The experimental results demonstrate that HCCST can achieve better retrieval performance compared with the state-of-the-art image hashing methods. The open-source code of this project is released at: https://github.com/lzHZWZ/HCCST.git.
What problem does this paper attempt to address?