Abstract:Fine-grained visual categorization, which aims to identify the different subcategories of images within the same category, is a very challenging task due to the large intra-class differences and subtle inter-class variances. The existing methods mostly focus on the salient local regions and ignore other features which probably help to recognize the images more precisely. To address this issue, in this paper, we propose a novel end-to-end network composed of the self-calibrated convolution, gradual attention module and feature inverse module for fine-grained visual categorization. To extract the salient features, the self-calibrated convolution is exploited which can avoid the influence of irrelevant information and locate salient regions more accurately. In aiming to extract the discriminative features, we propose the gradual attention module which consists of alternate channel-spatial attention and hierarchical feature grouping. The gradual attention module can extract the subtle discriminative features gradually even when the semantic information of shallow stages is not rich. Moreover, we design the feature inverse module which forces the next stage of network to search for other different useful features by feature inverse. The gradual attention module combined with the feature inverse module is capable of finding more detailed regions and of benefit to improving classification performance. Finally, the stage features and fused features are jointly used for classification. The proposed method is evaluated on three classical fine-grained image datasets and compared with a number of state-of-the-art methods. Our method achieves 89.5%, 95.2% and 93.9% accuracies on CUB-200-2011, Stanford Cars and FGVC-Aircraft datasets respectively. The experimental results demonstrate the effectiveness and superiority of the proposed method.

SupCon-ViT: Supervised contrastive learning for ultra-fine-grained visual categorization.

Fine-Grained Visual Categorization With Fine-Tuned Segmentation

CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization

Fine-grained Category Discovery under Coarse-grained Supervision with Hierarchical Weighted Self-contrastive Learning

Learning More Discriminative Clues with Gradual Attention for Fine-Grained Visual Categorization.

Learning Contrastive Self-Distillation for Ultra-Fine-Grained Visual Categorization Targeting Limited Samples

Discriminative Region Enhancing and Suppression Network for Fine-Grained Visual Categorization

Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

Attention Convolutional Binary Neural Tree for Fine-Grained Visual Categorization

Learning Enhanced Features and Inferring Twice for Fine-Grained Image Classification

Diving into Continual Ultra-fine-grained Visual Categorization

C3T: Contrastive Consistency Cross-Network Learning for Semi-Supervised Semantic Segmentation

AP-CNN: Weakly Supervised Attention Pyramid Convolutional Neural Network for Fine-Grained Visual Classification

Progressive Co-Attention Network for Fine-grained Visual Classification.

A Survey of Fine-Grained Image Categorization

Fine-Grained Visual Categorization: A Spatial–Frequency Feature Fusion Perspective

Novel Class Discovery for Ultra-Fine-Grained Visual Categorization

Cross-Level Multi-Instance Distillation for Self-Supervised Fine-Grained Visual Categorization

Augmenting Strong Supervision Using Web Data for Fine-Grained Categorization.

A Survey on Fine-grained Image Categorization Using Deep Convolutional Features

Coarse-to-Fine Description for Fine-Grained Visual Categorization