Abstract:The Bag-of-Features (BoF) model has played an important role for image representation in many multimedia applications. It has been extensively applied to many tasks including image classification, image retrieval, scene understanding, and so on. Despite the advantages of this model such as simplicity, efficiency and generality, there are also notable drawbacks for this model, including poor power of semantic expression of local descriptors, and lack of robust structures upon single visual words. To overcome these problems, various techniques have been proposed, such as multiple descriptors, spatial context modeling and interest region detection. Though they have been proven to improve the BoF model to some extent, there still lacks a coherent scheme to integrate each individual module. To address the problems above, we propose a novel framework with spatial pooling of heterogeneous features. Our framework differs from the traditional Bag-of-Features model on three aspects. First, we propose a new scheme for combining texture and edge based local features together at the descriptor extraction level. Next, we build geometric visual phrases to model spatial context upon heterogeneous features for mid-level representation of images. Finally, based on a smoothed edgemap, a simple and effective spatial weighting scheme is performed on our mid-level image representation. We test our integrated framework on several benchmark datasets for image classification and retrieval applications. The extensive results show the superior performance of our algorithm over state-of-the-art methods.

Adaptive Bilinear Pooling for Fine-grained Representation Learning.

Fine-grained Visual Classification Via Multilayer Bilinear Pooling with Object Localization

Learning Relative Features Through Adaptive Pooling for Image Classification

DeepBP: A bilinear model integrating multi-order statistics for fine-grained recognition

Compact Bilinear Pooling via General Bilinear Projection

Learning Attentive Pairwise Interaction for Fine-Grained Classification

Spatial pooling of heterogeneous features for image applications.

Compare More Nuanced:Pairwise Alignment Bilinear Network For Few-shot Fine-grained Learning

Adaptive Salience Preserving Pooling for Deep Convolutional Neural Networks

GBP: Graph convolutional network embedded in bilinear pooling for fine-grained encoding

Combining Local and Global: Rich and Robust Feature Pooling for Visual Recognition.

Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering.

Learning Deep Bilinear Transformation for Fine-grained Image Representation

Improved Bilinear CNN Model for Remote Sensing Scene Classification

Cross-convolutional-layer Pooling for Generic Visual Recognition.

Bag of Shape Features with a Learned Pooling Function for Shape Recognition

Generalized regular spatial pooling for image classification

R2FP: Rich and Robust Feature Pooling for Mining Visual Data

Spontaneous regression of orbital Langerhans cell granulomatosis in a three-year-old girl.

Fine-grained Species Recognition with Privileged Pooling: Better Sample Efficiency Through Supervised Attention

CSPS: An Adaptive Pooling Method for Image Classification