Abstract:Fine-grained visual categorization, which aims to identify the different subcategories of images within the same category, is a very challenging task due to the large intra-class differences and subtle inter-class variances. The existing methods mostly focus on the salient local regions and ignore other features which probably help to recognize the images more precisely. To address this issue, in this paper, we propose a novel end-to-end network composed of the self-calibrated convolution, gradual attention module and feature inverse module for fine-grained visual categorization. To extract the salient features, the self-calibrated convolution is exploited which can avoid the influence of irrelevant information and locate salient regions more accurately. In aiming to extract the discriminative features, we propose the gradual attention module which consists of alternate channel-spatial attention and hierarchical feature grouping. The gradual attention module can extract the subtle discriminative features gradually even when the semantic information of shallow stages is not rich. Moreover, we design the feature inverse module which forces the next stage of network to search for other different useful features by feature inverse. The gradual attention module combined with the feature inverse module is capable of finding more detailed regions and of benefit to improving classification performance. Finally, the stage features and fused features are jointly used for classification. The proposed method is evaluated on three classical fine-grained image datasets and compared with a number of state-of-the-art methods. Our method achieves 89.5%, 95.2% and 93.9% accuracies on CUB-200-2011, Stanford Cars and FGVC-Aircraft datasets respectively. The experimental results demonstrate the effectiveness and superiority of the proposed method.

Aggregate Attention Module for Fine-Grained Image Classification

Mixed Attention Mechanism for Small-Sample Fine-grained Image Classification

Fine-grained image classification method based on hybrid attention module

HAM: Hybrid Attention Module in Deep Convolutional Neural Networks for Image Classification

Subtler mixed attention network on fine-grained image classification

Attention Graph: Learning Effective Visual Features for Large-Scale Image Classification

Fine-grained Image Recognition Via Attention Interaction and Counterfactual Attention Network

Fine-grained Image Recognition Based on Attention Map and Image Sampling

Focus Longer to See Better: Recursively Refined Attention for Fine-Grained Image Classification

Fine-Grained Recognition Via Attribute-Guided Attentive Feature Aggregation

Dual attention guided multi-scale CNN for fine-grained image classification

Learning More Discriminative Clues with Gradual Attention for Fine-Grained Visual Categorization.

Research on classification algorithms for attention mechanism

Bilinear Residual Attention Networks for Fine-Grained Image Classification

The Application of Two-Level Attention Models in Deep Convolutional Neural Network for Fine-Grained Image Classification

Adversarial Erasing Attention for Fine-Grained Image Classification.

Weakly supervised fine-grained image classification via two-level attention activation model

Attention-based cropping and erasing learning with coarse-to-fine refinement for fine-grained visual classification

Beyond the Attention: Distinguish the Discriminative and Confusable Features For Fine-grained Image Classification

Improved deep CNNs based on Nonlinear Hybrid Attention Module for image classification

Feature Channel Adaptive Enhancement for Fine-Grained Visual Classification