Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Bird Species Categorization

Xiu-Shen Wei,Chen-Wei Xie,Jianxin Wu,Chunhua Shen
DOI: https://doi.org/10.1016/j.patcog.2017.10.002
IF: 8
2017-01-01
Pattern Recognition
Abstract:•To the best of our knowledge, Mask-CNN is the first end-to-end model that selects deep convolutional descriptors for object recognition, especially for fine-grained image recognition.•We present a novel and efficient part-based three-stream model for fine-grained recognition. By discarding the fully connected layers, the proposed M-CNN is computationally efficient (cf. Table 1 and Table 4 in experiments). Additionally, comparing with state-of-the-art methods, M-CNN has smaller feature dimensionality. Beyond those, it achieves the highest classification accuracy on CUB200-2011 and Birdsnap among published methods.•The part localization performance of the proposed model outperforms other part-based finegrained approaches which requires additional bounding boxes. In particular, M-CNN is 12.76% higher than state-of-the-art for head localization on CUB200-2011.
What problem does this paper attempt to address?