Learning Regions and Descriptors for Fine-grained Recognition

Dequan Wang,Tianjun Xiao,Zhiqiang Shen,Xiangyang Xue
2015-01-01
Abstract:Fine-grained categorization, which aims to distinguish subordinate-level categories such as bird species or dog breeds, is an extremely challenging task due to two main issues: how to localize discriminative regions for recognition and how to learn sophisticated features for representation. In this paper, we develop a joint representation learning framework which simultaneously detects informative regions and distinguishes subtle differences for subordinatelevel categories. The region detectors are learned in unsupervised settings, based on the observation that neural networks for fine-grained recognition have special spatial distributions for regions of interest from object-level to partlevel. The appearance descriptor are the concatenation of hierarchical convolutional neural network features encoding both coarse-grained and fine-grained visual differences. Only image-level labels are necessary for training in our approach, which avoids using labor-intensive bounding box or part annotations from end-to-end. Experimental results on challenging fine-grained image dataset demonstrate that despite of the weakest supervision our approach outperforms most of state-of-the-art methods and even achieves accuracy comparable with the methods which heavily rely on extra annotations.
What problem does this paper attempt to address?