Abstract:Bag-of-words models have been widely used to obtain the global representation for action recognition. However, these models ignored the structure information, such as the spatial and temporal contextual information for action representation. In this paper, we propose a novel structured codebook construction method to encode spatial and temporal contextual information among local features for video representation. Given a set of training videos, our method first extracts local motion and appearance features. Next, we encode the spatial and temporal contextual information among local features by constructing correlation matrices for local spatio-temporal features. Then, we discover the common patterns of movements to construct the structured codebook. After that, actions can be represented by a set of sparse coefficients with respect to the structured codebook. Finally, a simple linear SVM classifier is applied to predict the action class based on the action representation. Our method has two main advantages compared to traditional methods. First, our method automatically discovers the mid-level common patterns of movements that capture rich spatial and temporal contextual information. Second, our method is robust to unwanted background local features mainly because most unwanted background local features cannot be sparsely represented by the common patterns and they are treated as residual errors that are not encoded into the action representation. We evaluate the proposed method on two popular benchmarks: KTH action dataset and UCF sports dataset. Experimental results demonstrate the advantages of our structured codebook construction.

Adaptive Learning Codebook for Action Recognition

Learning human actions with an adaptive codebook

Study on the Generation Model of Weighted Visual Codebook for Action Recognition

Shrinking Encoding with Two-Level Codebook Learning for Fine-Grained Fish Recognition

Action Recognition Via Structured Codebook Construction

Learning Discriminative Visual Codebook For Human Action Recognition

Optimized codebook construction method for class-specific recognition tasks

Discriminative Codebook Learning for Web Image Search.

A Fast Algorithm For Creating A Compact And Discriminative Visual Codebook

Visual Codebook Construction for Class-Specific Recognition

Codebook Enhancement of Vlad Representation for Visual Recognition.

Codebook Optimization Using Word Activation Forces for Scene Categorization.

Class-Specific Codebook Construction for Biologically Inspired Recognition

Discriminative Spatial Codebook Generation for Image Classification

Codebook Reconstruction with Word Correlation Feedback Mechanism.

Metric Learning in Codebook Generation of Bag-of-Words for Person Re-identification

Incremental Codebook Adaptation for Visual Representation and Categorization

Group Sparse Representation of Adaptive Sub-Domain Selection for Image Classification

An Incremental Clustering Based Codebook Construction in Video Copy Detection

Category Sensitive Codebook Construction for Object Category Recognition

LDA based compact and discriminative dictionary learning for sparse coding