Abstract:Fine-grained image classification is challenging due to the large intra-class variance and small inter-class variance, aiming at recognizing hundreds of sub-categories belonging to the same basic-level category. Since two different sub-categories is distinguished only by the subtle differences in some specific parts, semantic part localization is crucial for fine-grained image classification. Most previous works improve the accuracy by looking for the semantic parts, but rely heavily upon the use of the object or part annotations of images whose labeling are costly. Recently, some researchers begin to focus on recognizing sub-categories via weakly supervised part detection instead of using the expensive annotations. However, these works ignore the spatial relationship between the object and its parts as well as the interaction of the parts, both of them are helpful to promote part selection. Therefore, this paper proposes a weakly supervised part selection method with spatial constraints for fine-grained image classification, which is free of using any bounding box or part annotations. We first learn a whole-object detector automatically to localize the object through jointly using saliency extraction and co-segmentation. Then two spatial constraints are proposed to select the distinguished parts. The first spatial constraint, called box constraint, defines the relationship between the object and its parts, and aims to ensure that the selected parts are definitely located in the object region, and have the largest overlap with the object region. The second spatial constraint, called parts constraint, defines the relationship of the object's parts, is to reduce the parts' overlap with each other to avoid the information redundancy and ensure the selected parts are the most distinguishing parts from other categories. Combining two spatial constraints promotes parts selection significantly as well as achieves a notable improvement on fine-grained image classification. Experimental results on CUB-200-2011 dataset demonstrate the superiority of our method even compared with those methods using expensive annotations.

Max-margin Analysis Based Patch Sampling for Discovery of Mid-Level Parts

A Maximum Margin Segmentation Selection for Visual Object Detection

Discriminative Middle-Level Parts Mining for Object Detection

Selective Parts For Fine-Grained Recognition

Learning Part-Based Mid-Level Representation for Visual Recognition

Weakly Supervised Learning of Part Selection Model with Spatial Constraints for Fine-Grained Image Classification

From the whole to detail: Progressively sampling discriminative parts for fine-grained recognition

Mejigclu: more effective jigsaw clustering for unsupervised visual representation learning

Human Detection Method Based on Multi-Part Detector and Multi-Instance Learning

Learning a part vocabulary by clustering ensemble for object class recognition

Automatic Discovery and Optimization of Parts for Image Classification

Detecting Semantic Parts on Partially Occluded Objects

Selective Sparse Sampling for Fine-Grained Image Recognition

Propagating Image-Level Part Statistics to Enhance Object Detection

PatchNet: Maximize the Exploration of Congeneric Semantics for Weakly Supervised Semantic Segmentation

Object-Part Attention Driven Discriminative Localization for Fine-grained Image Classification.

Weakly Supervised PatchNets : Learning Aggregated Patch Descriptors for Scene Recognition

Unsupervised learning of object semantic parts from internal states of CNNs by population encoding

Unsupervised Part Segmentation through Disentangling Appearance and Shape

Mid-level Deep Pattern Mining

Weakly Supervised Semantic Segmentation via Progressive Patch Learning