Scene Understanding Based on Multi-Scale Pooling of Deep Learning Features

DongYang Li,Yue Zhou
DOI: https://doi.org/10.2991/amcce-15.2015.308
2015-01-01
Abstract:Deep convolutional neural networks (CNNs) have recently shown impressive performance as generic representation for recognition. However, the feature extracted from global CNNs lack geometric invariance, which limits their robustness for classification and detection of highly variable objects. To improve the invariance of the features without degrading their discriminative power and speed up the calculation, we follow the next two method. Firstly, we adopt the scheme called multi-scale orderless pooling (MOP-CNN) which extracts CNNs activation from local patches of the image at multiple scale levels, performs orderless VLAD pooling of these activations at each level separately, and concatenates the result. Second, to speed up the calculation, we adapt the SPP-net as the CNNs architecture. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. On the challenging SUN397 Scenes classification datasets, our method achieves competitive classification results.
What problem does this paper attempt to address?