Learning Rich Part Hierarchies with Progressive Attention Networks for Fine-Grained Image Recognition

Heliang Zheng,Jianlong Fu,Zheng-Jun Zha,Jiebo Luo,Tao Mei
DOI: https://doi.org/10.1109/tip.2019.2921876
IF: 10.6
2019-01-01
IEEE Transactions on Image Processing
Abstract:We investigate the localization of subtle yet discriminative parts for fine-grained image recognition. Based on the observation that such parts typically exist within a hierarchical structure (e.g., from a coarse-scale "head" to a fine-scale "eye" when recognizing bird species), we propose a novel progressive-attention convolutional neural network (PA-CNN) to progressively localize parts at multiple scales. The PA-CNN localizes parts in two steps, where a part proposal network (PPN) generates multiple local attention maps, and a part rectification network (PRN) learns part-specific features from each proposal and provides the PPN with refined part locations. This coupling of the PPN and PRN allows them to be optimized in a mutually reinforcing manner, leading to improved pinpointing of fine-grained parts. Moreover, the convolutional parameters for a PPN at a finer scale can be inherited from the PRN at a coarser scale, enabling a rich part hierarchy (e.g., eye and beak in a bird's head) to be learned in a stacked fashion. Case studies show that PA-CNN can precisely identify parts without using bounding box/part annotations. In addition, quantitative evaluations demonstrate that PA-CNN yields state-of-the-art performance in three challenging fine-grained recognition tasks. i.e., CUB-2000-2011, FGVC-Aircraft, and Stanford Cars.
What problem does this paper attempt to address?