Abstract:Recently, sparse coding-based algorithms have achieved high performance on several popular scene classification benchmarks. Yet extensive efforts along this direction focus on strategies for coding and dictionary learning, few works have addressed the problem of optimal pooling regions selection. In this work, we show that the Viola-Jones algorithm, which is well-known in face detection, can be tailored to learning receptive fields for the sparse coding algorithms. Specifically, using the boosting approach to receptive field learning, image/scene categorization performance can be ubiquitously enhanced on several benchmarks (UIUC sport event, 15 natural scenes and the Caltech 101 dataset) to the state-of-the-art, using only low dimensional features and small codebook sizes. Furthermore, the “salient pooling regions” can be obtained explicitly.

A Boosting Approach to Learning Receptive Fields for Scene Categorization