Abstract:To efficiently capture feature information in tasks of fine-grained image classification, this study introduces a new network model for fine-grained image classification, which utilizes a hybrid attention approach. The model is built upon a hybrid attention module (MA), and with the assistance of the attention erasure module (EA), it can adaptively enhance the prominent areas in the image and capture more detailed image information. Specifically, for tasks involving fine-grained image classification, this study designs an attention module capable of applying the attention mechanism to both the channel and spatial dimensions. This highlights the important regions and key feature channels in the image, allowing for the extraction of distinct local features. Furthermore, this study presents an attention erasure module (EA) that can remove significant areas in the image based on the features identified; thus, shifting focus to additional feature details within the image and improving the diversity and completeness of the features. Moreover, this study enhances the pooling layer of ResNet50 to augment the perceptual region and the capability to extract features from the network's less deep layers. For the objective of fine-grained image classification, this study extracts a variety of features and merges them effectively to create the final feature representation. To assess the effectiveness of the proposed model, experiments were conducted on three publicly available fine-grained image classification datasets: Stanford Cars, FGVC-Aircraft, and CUB-200–2011. The method achieved classification accuracies of 92.8, 94.0, and 88.2% on these datasets, respectively. In comparison with existing approaches, the efficiency of this method has significantly improved, demonstrating higher accuracy and robustness.

Fine-grained Image Retrieval by Combining Attention Mechanism and Context Information

Fine-Grained Visual Categorization With Fine-Tuned Segmentation

DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines

Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval

AE-Net: Fine-grained Sketch-Based Image Retrieval Via Attention-Enhanced Network

Cross-Graph Attention Enhanced Multi-Modal Correlation Learning for Fine-Grained Image-Text Retrieval

Fine-Grained Image Recognition Methods and Their Applications in Remote Sensing Images: A Review

A feature consistency driven attention erasing network for fine-grained image retrieval

One-Shot Fine-Grained Instance Retrieval

Context‐aware relation enhancement and similarity reasoning for image‐text retrieval

Image Retrieval Based on Fuzzy Semantic Relevance Matrix

Multi-Grained Attention Network with Mutual Exclusion for Composed Query-Based Image Retrieval

Multi-Level Region Matching for Fine-Grained Sketch-Based Image Retrieval

Deep Reinforced Attention Regression for Partial Sketch Based Image Retrieval

Fine-grained image recognition via trusted multi-granularity information fusion

Fine-grained image classification method based on hybrid attention module

Lifelong Fine-grained Image Retrieval

Scene Graph Based Fusion Network For Image-Text Retrieval

Global-aware Fragment Representation Aggregation Network for image–text retrieval

Gradient aggregation based fine-grained image retrieval : A unified viewpoint for CNN and Transformer