Abstract:Fine-grained visual categorization, which aims to identify the different subcategories of images within the same category, is a very challenging task due to the large intra-class differences and subtle inter-class variances. The existing methods mostly focus on the salient local regions and ignore other features which probably help to recognize the images more precisely. To address this issue, in this paper, we propose a novel end-to-end network composed of the self-calibrated convolution, gradual attention module and feature inverse module for fine-grained visual categorization. To extract the salient features, the self-calibrated convolution is exploited which can avoid the influence of irrelevant information and locate salient regions more accurately. In aiming to extract the discriminative features, we propose the gradual attention module which consists of alternate channel-spatial attention and hierarchical feature grouping. The gradual attention module can extract the subtle discriminative features gradually even when the semantic information of shallow stages is not rich. Moreover, we design the feature inverse module which forces the next stage of network to search for other different useful features by feature inverse. The gradual attention module combined with the feature inverse module is capable of finding more detailed regions and of benefit to improving classification performance. Finally, the stage features and fused features are jointly used for classification. The proposed method is evaluated on three classical fine-grained image datasets and compared with a number of state-of-the-art methods. Our method achieves 89.5%, 95.2% and 93.9% accuracies on CUB-200-2011, Stanford Cars and FGVC-Aircraft datasets respectively. The experimental results demonstrate the effectiveness and superiority of the proposed method.

PARTICLE: Part Discovery and Contrastive Learning for Fine-grained Recognition

Fine-Grained Visual Categorization With Fine-Tuned Segmentation

Align Yourself: Self-supervised Pre-training for Fine-grained Recognition via Saliency Alignment.

Fine-Grained Image Classification with Object-Part Model

Part-Guided Relational Transformers for Fine-Grained Visual Recognition

From the whole to detail: Progressively sampling discriminative parts for fine-grained recognition

Learning Enhanced Features and Inferring Twice for Fine-Grained Image Classification

Cross-Part Learning for Fine-Grained Image Classification

Classification-Specific Parts for Improving Fine-Grained Visual Categorization

Learning Two-level Features for Fine-grained Image Classification

Selective Parts For Fine-Grained Recognition

A Multi-part Convolutional Attention Network for Fine-Grained Image Recognition

PaCL: Part-level Contrastive Learning for Fine-grained Few-shot Image Classification

Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks

Learning More Discriminative Clues with Gradual Attention for Fine-Grained Visual Categorization.

Iterative Object and Part Transfer for Fine-Grained Recognition

Weakly Supervised Learning Of Object-Part Attention Model For Fine-Grained Image Classification

Feature Boosting, Suppression, and Diversification for Fine-Grained Visual Classification.

Learning Regions and Descriptors for Fine-grained Recognition

Learning Scale-Consistent Attention Part Network for Fine-grained Image Recognition

Learning Mutually Exclusive Part Representations for Fine-Grained Image Classification