Scene Image Retrieval with Siamese Spatial Attention Pooling

Jinyu Ma,Xiaodong Gu
DOI: https://doi.org/10.1016/j.neucom.2020.05.090
IF: 6
2020-01-01
Neurocomputing
Abstract:Content-based image retrieval (CBIR) aims to retrieve images from a given image collection according to similarities of image contents. In this paper, we focus on retrieval of scene images. We propose a siamese spatial attentive model which bases on siamese architecture and incorporates attention mechanism to generate compatible image embeddings. It extracts local features with a convolutional neural network (CNN), which starts with pre-trained parameters and is well fine-tuned for retrieval. Spatial attention pooling is proposed to take feature maps as input and generate weights for local features, which are then used to refine local features via weighted sum-pooling. Such pooling alleviates impacts from disturbance and concentrates on meaningful parts of images. Therefore, the model is able to output robust representations for noisy images. We also propose a multi-stage training scheme for the model, which leads to better performance than normal one-pass training scheme. Extensive experimental results on benchmark image retrieval datasets show that our model is competitive in retrieval performance.
What problem does this paper attempt to address?