Abstract:Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, FVC implementations employ the Gaussian mixture model (GMM) as the generative model for local features. However, the representative power of a GMM can be limited because it essentially assumes that local features can be characterized by a fixed number of feature prototypes, and the number of prototypes is usually small in FVC. To alleviate this limitation, in this work, we break the convention which assumes that a local feature is drawn from one of a few Gaussian distributions. Instead, we adopt a compositional mechanism which assumes that a local feature is drawn from a Gaussian distribution whose mean vector is composed as a linear combination of multiple key components, and the combination weight is a latent random variable. In doing so we greatly enhance the representative power of the generative model underlying FVC. To implement our idea, we design two particular generative models following this compositional approach. In our first model, the mean vector is sampled from the subspace spanned by a set of bases and the combination weight is drawn from a Laplace distribution. In our second model, we further assume that a local feature is composed of a discriminative part and a residual part. As a result, a local feature is generated by the linear combination of discriminative part bases and residual part bases. The decomposition of the discriminative and residual parts is achieved via the guidance of a pre-trained supervised coding method. By calculating the gradient vector of the proposed models, we derive two new Fisher vector coding strategies. The first is termed Sparse Coding-based Fisher Vector Coding (SCFVC) and can be used as the substitute of traditional GMM based FVC. The second is termed Hybrid Sparse Coding-based Fisher vector coding (HSCFVC) since it combines the merits of both pre-trained supervised coding methods and FVC. Using pre-trained Convolutional Neural Network (CNN) activations as local features, we experimentally demonstrate that the proposed methods are superior to traditional GMM based FVC and achieve state-of-the-art performance in various image classification tasks.

Deep-based Fisher Vector for Mobile Visual Search

Feature Based Inter Prediction Optimization for Non-Translational Video Coding in Cloud

Action Recognition with Stacked Fisher Vectors.

Selectively Aggregated Fisher Vectors of Query Video for Mobile Visual Search

CFVL: A Coarse-to-Fine Vehicle Localizer with Omnidirectional Perception Across Severe Appearance Variations

Optimizing Binary Fisher Codes for Visual Search

Object Detection and Localization in 3D Environment by Fusing Raw Fisheye Image and Attitude Data

Spatial Weighted Fisher Vector for Image Retrieval

Downside Hemisphere Object Detection and Localization of MAV by Fisheye Camera

Codebook-Free Compact Descriptor for Scalable Visual Search.

Robust fisher codes for large scale image retrieval

Deep FisherNet for Object Classification

Encoding High Dimensional Local Features By Sparse Coding Based Fisher Vectors

VRFP: On-the-fly Video Retrieval using Web Images and Fast Fisher Vector Products

Mobile Visual Search Compression with Grassmann Manifold Embedding

Deep Visual-semantic for Crowded Video Understanding

Depth-based Local Feature Selection for Mobile Visual Search

A Compact Binary Aggregated Descriptor Via Dual Selection for Visual Search

GPU-FV: Realtime Fisher Vector and Its Applications in Video Monitoring

Compositional Model Based Fisher Vector Coding for Image Classification

Rate-Adaptive Compact Fisher Codes for Mobile Visual Search