Gradient aggregation based fine-grained image retrieval : A unified viewpoint for CNN and Transformer

Han Yu,Huibin Lu,Min Zhao,Zhuoyi Li,Guanghua Gu
DOI: https://doi.org/10.1016/j.patcog.2023.110248
IF: 8
2024-01-06
Pattern Recognition
Abstract:The gradients of CNN are traditionally utilized for optimization and visualization. In this paper, we find that a discriminative representation hides in the gradients of convolution filters. Based on this, we propose a corresponding feature extraction and aggregation method for fine-grained image retrieval (FGIR). Firstly, we propose a metric to evaluate manually-designed loss functions and design a loss function originating from Grad-CAM in the testing phase based on it to extract the gradients of the convolution filters. Secondly, we take the gradients as the new features and design a succinct approach to aggregate them into a compact vector, which is named as Convolution Filters Gradient Aggregation (CFGA) feature. CFGA features can be extracted from pre-trained and fine-tuned CNN models. Extensive experiments are conducted on FGIR to verify the effectiveness of our proposed CFGA approach, compared with five supervised state-of-the-art methods and two unsupervised methods on two standard fine-grained retrieval datasets. Moreover, we generalize the CFGA method designed for CNN to Swin Transformer, and propose the Transformer parameter gradients aggregation (TPGA) method, which proves the applicability of the core idea of CFGA/TPGA to mainstream feature extraction models. We achieve state-of-the-art FGIR performance on CUB-200-2011 dataset and CARS196 dataset.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?