Pixel Saliency Based Encoding for Fine-Grained Image Classification.

Chao Yin,Lei Zhang,Ji Liu
DOI: https://doi.org/10.1007/978-3-030-03398-9_24
2018-01-01
Abstract:Fine-grained image classification concerns categorization at subordinate levels, where the distinction between inter-class objects is very subtle and highly local. Recently, Convolutional Neural Networks (CNNs) have almost yielded the best results on the basic image classification tasks. In CNN, the direct pooling operation is always used to resize the last convolutional feature maps from (ntimes n times c) to (1times 1times c) for feature representation. However, such pooling operation may lead to extreme saliency compression of feature map, especially in fine-grained image classification. In this paper, to more deeply explore the representation ability of the feature map, we propose a Pixel Saliency based Encoding method, which is called PS-CNN. First, in our PS-CNN, the saliency matrix is obtained by evaluating the saliency of each pixel in the feature map. Then, we segment the original feature maps into multiple ones with multiple generated binary masks via thresholding on the obtained saliency matrix, and subsequently squeeze those masked feature maps into the encoded ones. Finally, a fine-grained feature representation is generated by concatenating the original feature maps with the encoded ones. Experimental results show that our simple yet powerful PS-CNN outperforms state-of-the-art classification approaches. Specially, we can achieve (89.1%) classification accuracy on the Aircraft, (92.3%) on the Stanford Car, and (81.9%) on the NABirds.
What problem does this paper attempt to address?