Abstract:The local spatio-temporal descriptor and feature encoding algorithm are two crucial key steps for human action recognition based on spatio-temporal interest points (STIP). Since the local descriptors for STIP are essentially a type of motion information based on the texture, the key point of local feature description is to extract invariable, robust and distinguishable local texture features and motion information in reference spatio-temporal volume. Scattering transform is an image transform method based on directional wavelet transform and scale convolution, which has local translation invariance, rotation invariance and elastic deformation stability for local texture features. A novel local descriptor for STIP based on spatio-temporal three-dimensional scattering transform is proposed in this paper, which extends the original scattering transform to spatio-temporal three-dimensional space. Compared to the traditional descriptors, such as HOG, HOF and so on, the proposed scattering transform coefficients based histogram of oriented gradients (STC-HOG) descriptor can capture more robust and distinguishable motion information of local texture for STIP. In order to incorporate the local descriptors into action video representation, the feature encoding algorithm is indispensable. For the problem that vector of locally aggregated descriptors (VLAD) loses feature distribution location information during feature encoding, a histogram of distribution vector of locally aggregated descriptors (HOD-VALD) based on Gaussian kernel is proposed. We validated the proposed algorithm for human action recognition on multiple public available datasets, such as KTH, UCF Sports, HMDB51 and so on. The evaluation experiment results indicate that the proposed descriptor and encoding method can improve the efficiency of human action recognition and the recognition accuracy.

A Compact Descriptor Chog3d And Its Application In Human Action Recognition

A compact 3D descriptor in ROI for human action recognition

SOM-based Human Action Recognition Using Local Feature Descriptor CHOG3D

Human Action Recognition Based on Improved CoHOG-LQC

Human Action Recognition By Som Considering The Probability Of Spatio-Temporal Features

Based on cluster tree human action recognition algorithm for monocular video

Spatial-temporal Histograms of Gradients and HOD-VLAD Encoding for Human Action Recognition

An HOG-CT Human Detector with Histogram-Based Search.

Learning 3D Compact Binary Descriptor for Human Action Recognition in Video.

Recognizing actions using depth motion maps-based histograms of oriented gradients

Human Action Recognition Based on DMMs, HOGs and Contourlet Transform

A Novel 3D Gradient LBP Descriptor for Action Recognition

Human Action Recognition by Using Polyhedron Model-Based Spatio-Temporal Gradient Descriptor

Robust Human Action Recognition Based on Spatio-Temporal Descriptors and Motion Temporal Templates

Oriented Gradients for Human Action Recognition.

Action Recognition with Joints-Pooled 3D Deep Convolutional Descriptors

Human Action Recognition Based on Spatio-Temporal Three-Dimensional Scattering Transform Descriptor and an Improved VLAD Feature Encoding Algorithm

Human Action Recognition Based on Kinematic Similarity in Real Time.

I3D-Shufflenet Based Human Action Recognition

Learned Spatio-Temporal Texture Descriptors for RGB-D Human Action Recognition.

Action Recognition Using 3D DAISY Descriptor