Abstract:As an emerging research topic, fine-grained visual categorization has been attracting growing attentions in recent years. Due to the large inter-class similarity and intra-class variance, recognizing objects in fine-grained domains is extremely challenging, and sometimes even humans can not recognize them accurately. Traditional bag-of-words model could obtain desirable results for basic-level category classification by weak alignment using spatial pyramid matching model, but may easily fail in fine-grained domains since the discriminative features are not only subtle but also extremely localized. The fine differences often get swamped by those irrelevant features, and it is virtually impossible to distinguish them. To address the problems above, we propose a new framework for fine-grained visual categorization. We strengthen the spatial correspondence among parts by including foreground segmentation and part localization. Based on the part representations of the images, we learn a large set of mid-level features which are more suitable for fine-grained tasks. Comparing with the low level features directly extracted from the images, the learned one-vs-all mid-level features enjoy the following advantages. First, the dimension of the mid-level features is relatively small. In order to obtain high classification accuracy, the dimension of the low level features usually reaches several thousand to tens of thousand, and becomes even larger when introducing spatial pyramid model. However, the dimension of our mid-level features is related to the number of classes, which is far less. Second, each entry of the proposed mid-level features is meaningful, which forms a more compact representation of the image. Third, the mid-level features are more robust than the low level ones, which is helpful for classification. Fourth, the learning process of the mid-level features is independent and can be easily combined with other techniques to boost the performance. We evaluate the proposed approach on the extensive fine-grained dataset CUB 200-2011 and Stanford Dogs, by learning the mid-level features based on the popular Fisher vectors and convolutional neural network, we boost the classification accuracy by a considerable margin and advance the state-of-the-art performance in fine-grained visual categorization.

Learning Enhanced Features and Inferring Twice for Fine-Grained Image Classification

Fine-Grained Visual Categorization With Fine-Tuned Segmentation

Selecting Discriminative Features for Fine-Grained Visual Classification

Multi-Granularity Feature Distillation Learning Network for Fine-Grained Visual Classification

Learning Two-level Features for Fine-grained Image Classification

Channel Boosting, Cross-Layer Feature Integration, and Multi-Scale Classification for Fine-Grained Visual Classification

Graph-in-graph Discriminative Feature Enhancement Network for Fine-Grained Visual Classification

Feature Re-Attention and Multi-Layer Feature Fusion for Fine-Grained Visual Classification

Two-stage Fine-Grained Image Classification Model Based on Multi-Granularity Feature Fusion

Significant feature suppression and cross-feature fusion networks for fine-grained visual classification

Learning More Discriminative Clues with Gradual Attention for Fine-Grained Visual Categorization.

Granularity-aware Distillation and Structure Modeling Region Proposal Network for Fine-Grained Image Classification.

Integrating Foreground–background Feature Distillation and Contrastive Feature Learning for Ultra-Fine-grained Visual Classification

Fine-Graine Visual Classification with Aggregated Object Localization and Salient Feature Suppression

Leveraging Fine-Grained Labels to Regularize Fine-Grained Visual Classification.

Learning Semantically Enhanced Feature for Fine-Grained Image Classification

Fine-Grained Visual Categorization by Localizing Object Parts With Single Image

Fine-Grained Visual Classification Via Simultaneously Learning of Multi-regional Multi-grained Features

Fused One-Vs-all Mid-Level Features for Fine-Grained Visual Categorization

Cross-layer Navigation Convolutional Neural Network for Fine-grained Visual Classification

Cross-layer Progressive Attention Bilinear Fusion Method for Fine-Grained Visual Classification