Fisher Vector Based CNN Architecture for Image Classification.

Yan Song,Peiseng Wang,Xinhai Hong,Ian McLoughlin
DOI: https://doi.org/10.1109/icip.2017.8296344
2017-01-01
Abstract:In this paper, we tackle the representation learning problem for small scale fine-grained object recognition and scene classification tasks. Conventional bag of features(BoF) methods exploit hand-crafted frontend local features, and learn the representations via various machine learning techniques. Convolutional neural networks(CNN) directly learn the representation from raw images and benefit from joint optimization of network parameters in an end-to-end manner. However, the performance of existing representation learning methods is still unsatisfactory for the small-scale recognition tasks. To address this issue, we present a FV coding based CNN(FV-CNN) architecture. FV-CNN has three main advantages in that firstly it is able to exploit activations from the intermediate convolutional layer and a probabilistic discriminative model to derive the FV coding. Secondly, it takes advantage of the end-to-end back-propagation of the gradients to jointly optimize the whole learning process. Finally, it can learn a compact representation. When evaluated on benchmark datasets of fine grain object recognition (Caltech-CUB200), and scene classification (MIT67), accuracies of 88.0% and 82.2% are achieved.
What problem does this paper attempt to address?