A Scene Recognition Method Using Sparse Features with Layout-Sensitive Pooling and Extreme Learning Machine

Lingying Wu,Yuanlong Yu,Jason Gu
DOI: https://doi.org/10.1109/icinfa.2016.7831818
2016-01-01
Abstract:Scene recognition aims to find a semantic explanation of a scene, i.e., it helps intelligent machines to know where they are. It can be widely applied into various tasks in computer vision and robotics. Most of pioneer methods extracted a set of low-level features and put them into classifier directly to identify scene category. But it has been proved that low-level features do not work well. Currently researchers aim to overcome the semantic gap between the low-level vision features and high-level semantic categories to improve the recognition performance. Therefore, much attention has been put on transforming low-level descriptors into richer intermediate representations. This paper proposed a novel method based on intermediate feature representation to solve the problem of recognizing the semantic category of scene image. This proposed method uses sparse coding on SIFT features and presents a spatial layout sensitive pooling method. The space layout for pooling is based on three rectangles with size of 1*1,1*4 and 4*1 in each image. They are derived from inherent characteristics of the scene images by regularly dividing the image in horizontal and vertical direction. This spatial pooling strategy is easier and it can get optimal representation of scene images. Extreme learning machine (ELM) is used as a classifier. ELM has shown great ability to fit nonlinear classification boundaries. Experimental results have shown that this proposed method not only extracts lower dimension image feature but also outperforms other similar state-of-the-art methods in terms of recognition performance.
What problem does this paper attempt to address?