Abstract:Fine-grained image classification is challenging due to the large intra-class variance and small inter-class variance, aiming at recognizing hundreds of sub-categories belonging to the same basic-level category. Since two different sub-categories is distinguished only by the subtle differences in some specific parts, semantic part localization is crucial for fine-grained image classification. Most previous works improve the accuracy by looking for the semantic parts, but rely heavily upon the use of the object or part annotations of images whose labeling are costly. Recently, some researchers begin to focus on recognizing sub-categories via weakly supervised part detection instead of using the expensive annotations. However, these works ignore the spatial relationship between the object and its parts as well as the interaction of the parts, both of them are helpful to promote part selection. Therefore, this paper proposes a weakly supervised part selection method with spatial constraints for fine-grained image classification, which is free of using any bounding box or part annotations. We first learn a whole-object detector automatically to localize the object through jointly using saliency extraction and co-segmentation. Then two spatial constraints are proposed to select the distinguished parts. The first spatial constraint, called box constraint, defines the relationship between the object and its parts, and aims to ensure that the selected parts are definitely located in the object region, and have the largest overlap with the object region. The second spatial constraint, called parts constraint, defines the relationship of the object's parts, is to reduce the parts' overlap with each other to avoid the information redundancy and ensure the selected parts are the most distinguishing parts from other categories. Combining two spatial constraints promotes parts selection significantly as well as achieves a notable improvement on fine-grained image classification. Experimental results on CUB-200-2011 dataset demonstrate the superiority of our method even compared with those methods using expensive annotations.

Discriminative Semantic Parts Learning For Object Detection

Weakly Supervised Learning of Part Selection Model with Spatial Constraints for Fine-Grained Image Classification

Detecting Semantic Parts on Partially Occluded Objects

Semantic Part Detection via Matching: Learning to Generalize to Novel Viewpoints From Limited Training Data

Discriminative Middle-Level Parts Mining for Object Detection

Human Detection Method Based on Multi-Part Detector and Multi-Instance Learning

Discriminative Hough-Voting For Object Detection With Parts

Learning Dictionary of Discriminative Part Detectors for Image Categorization and Cosegmentation

Ensemble of Part Detectors for Simultaneous Classification and Localization.

Part-Aware Segmentation for Fine-Grained Categorization.

Max-margin Analysis Based Patch Sampling for Discovery of Mid-Level Parts

Orientational Spatial Part Modeling for Fine-Grained Visual Categorization

Object-Part Attention Driven Discriminative Localization for Fine-grained Image Classification.

Going Denser with Open-Vocabulary Part Segmentation

DSP: Discriminative Spatial Part Modeling for Fine-Grained Visual Categorization

Spatial Self-Distillation for Object Detection with Inaccurate Bounding Boxes

Learning Cascaded Shared-Boost Classifiers for Part-Based Object Detection

Parts4Feature: Learning 3D Global Features from Generally Semantic Parts in Multiple Views

Semantic Object Segmentation Via Detection in Weakly Labeled Video.

A multi-stage segmentation based on inner-class relation with discriminative learning

Subcategory-Aware Object Detection