From the whole to detail: Progressively sampling discriminative parts for fine-grained recognition
Chen Guo,Yaojin Lin,Shengyu Chen,Zhichun Zeng,Mingwen Shao,Shaozi Li
DOI: https://doi.org/10.1016/j.knosys.2021.107651
2022-01-01
Abstract:Fine-grained image recognition puts forward a special challenge due to the difficulties of distinguishing subtle inter-class differences and large intra-class variances. Existing weakly supervised approaches tend to capture the most discriminative regions, thereby guiding network to learn fine-grained features. However, current methods neglect the correlation between object and details, where object localization is conductive to part detection. In addition, they generally not only need heavy computational cost to find details with auxiliary subnet or selective strategy, but also require well-designed bounding boxes which are inflexible for different scale targets. In this paper, we propose a more lightweight framework to progressively sampling discriminative parts for learning details from coarse-scale to fine-scale, without any pre-designed bounding boxes. Our method first amplifies the object (e.g., bird, car) from the original image in the light of class visual patterns, then a self-adaptive region sampler applied to detect most informative regions from attention maps to learn fine-grained representations. The framework consists of three streams, i.e., the whole, the object and the detail respectively, thus hierarchical features can be preserved and learned. Furthermore, our approach can be trained end-to-end in a weakly supervised manner, and few computational costs are needed at inference phase. Comprehensive experiments and ablation studies demonstrate that the proposed method obtains competitive performance on three benchmarks.
computer science, artificial intelligence