Abstract:Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories that belong to the same superclass. Since the distinctions among similar subcategories are quite subtle and local, it is highly challenging to distinguish them from each other even for humans. So the localization of distinctions is essential for fine-grained visual categorization, and there are two pivotal problems: (1) Which regions are discriminative and representative to distinguish from other subcategories? (2) How many discriminative regions are necessary to achieve the best categorization performance? It is still difficult to address these two problems adaptively and intelligently. Artificial prior and experimental validation are widely used in existing mainstream methods to discover which and how many regions to gaze. However, their applications extremely restrict the usability and scalability of the methods. To address the above two problems, this paper proposes a multi-scale and multi-granularity deep reinforcement learning approach (M2DRL), which learns multi-granularity discriminative region attention and multi-scale region-based feature representation. Its main contributions are as follows: (1) Multi-granularity discriminative localization is proposed to localize the distinctions via a two-stage deep reinforcement learning approach, which discovers the discriminative regions with multiple granularities in a hierarchical manner (which problem), and determines the number of discriminative regions in an automatic and adaptive manner (how many problem). (2) Multi-scale representation learning helps to localize regions in different scales as well as encode images in different scales, boosting the fine-grained visual categorization performance. (3) Semantic reward function is proposed to drive M2DRL to fully capture the salient and conceptual visual information, via jointly considering attention and category information in the reward function. It allows the deep reinforcement learning to localize the distinctions in a weakly supervised manner or even an unsupervised manner. (4) Unsupervised discriminative localization is further explored to avoid the heavy labor consumption of annotating, and extremely strengthen the usability and scalability of our M2DRL approach. Compared with state-of-the-art methods on two widely-used fine-grained visual categorization datasets, our M2DRL approach achieves the best categorization accuracy.

Learning Regions and Descriptors for Fine-grained Recognition

Fine-Grained Visual Categorization With Fine-Tuned Segmentation

Multiple Granularity Descriptors for Fine-Grained Categorization.

Attention cutting and padding learning for fine-grained image recognition

Region based Ensemble Learning Network for Fine-grained Classification

Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding

Picking Neural Activations for Fine-Grained Recognition

Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization

Interpretable and Accurate Fine-grained Recognition via Region Grouping

Fine-Grained Imag E Categorization by Localizing Tiny Object Parts from Unannotated Images

Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition

Part Attentioned Fine-Grained Categorization Using Context Sensitive Flow

Re-rank Coarse Classification with Local Region Enhanced Features for Fine-Grained Image Recognition

Fine-grained Category Discovery under Coarse-grained Supervision with Hierarchical Weighted Self-contrastive Learning

Fused One-Vs-all Mid-Level Features for Fine-Grained Visual Categorization

From the whole to detail: Progressively sampling discriminative parts for fine-grained recognition

A Survey on Fine-grained Image Categorization Using Deep Convolutional Features

Weakly Supervised Fine-Grained Categorization With Part-Based Image Representation

Picking Deep Filter Responses for Fine-Grained Image Recognition

Fine-Grained Image Categorization by Localizing TinyObject Parts from Unannotated Images

Multi-level Dictionary Learning for Fine-Grained Images Categorization with Attention Model