Abstract:As the diversity and volume of images continue to grow, the demand for efficient fine-grained image retrieval has surged across numerous fields. However, the current deep learning-based approaches to fine-grained image retrieval often concentrate solely on the top-layer features, neglecting the relevant information carried in the middle layer, even though these information contains more fine-grained identification content. Moreover, these methods typically employ a uniform weighting strategy during hash code mapping, risking the loss of critical region mapping—an irreversible detriment to fine-grained retrieval tasks. To address the above problems, we propose a novel method for fine-grained image retrieval that leverage feature fusion and hash mapping techniques. Our approach harnesses a multi-level feature cascade, emphasizing not just top-layer but also intermediate-layer image features, and integrates a feature fusion module at each level to enhance the extraction of discriminative information. In addition, we introduce an agent self-attention architecture, marking its first application in this context, which steers the model to prioritize on long-range features, further avoiding the loss of critical regions of the mapping. Finally, our proposed model significantly outperforms existing state-of-the-art, improving the retrieval accuracy by an average of 40% for the 12-bit dataset, 22% for the 24-bit dataset, 16% for the 32-bit dataset, and 11% for the 48-bit dataset across five publicly available fine-grained datasets. We also validate the generalization ability and performance stability of our proposed method by another five datasets and statistical significance tests. Our code can be downloaded from https://github.com/BJFU-CS2012/MuiltNet.git.

Multi-Grained Selection and Fusion for Fine-Grained Image Representation

Fine-grained image recognition via trusted multi-granularity information fusion

Multi-directional guidance network for fine-grained visual classification

Multi-FusNet: fusion mapping of features for fine-grained image retrieval networks

Multiple Granularity Descriptors for Fine-Grained Categorization.

Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning

A Multi-View Fusion Method Via Tensor Learning And Gradient Descent For Image Features

Multi-View Feature Fusion and Rich Information Refinement Network for Semantic Segmentation of Remote Sensing Images

Fine-Grained Information Supplementation and Value-Guided Learning for Remote Sensing Image-Text Retrieval

MGFN: A Multi-Granularity Fusion Convolutional Neural Network for Remote Sensing Scene Classification

Selective Sparse Sampling for Fine-Grained Image Recognition

Multi-task self-supervised learning based fusion representation for Multi-view clustering

Multi-Granularity Part Sampling Attention for Fine-Grained Visual Classification

Embedding Label Structures for Fine-Grained Feature Representation

Weakly Supervised Fine-Grained Image Classification via Guassian Mixture Model Oriented Discriminative Learning

MFF-Trans: Multi-level Feature Fusion Transformer for Fine-Grained Visual Classification

GRPAFusion: A Gradient Residual and Pyramid Attention-Based Multiscale Network for Multimodal Image Fusion

Multi-Scale Fusion for Object Representation

Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval

Multi-task Attribute-Fusion Model for Fine-Grained Image Recognition