Abstract:Various improved methods based on the strategy of bag of words (BoW) are widely used to solve the problem of human action recognition. However, the spatial relationship between features is measured and utilized by these methods in a relatively single way. It limits the recognition performance of these methods. To solve this problem, double constrained bag of words (DC-BoW) is proposed to utilize the spatial distribution information between features belonging to three levels, which include descriptor-level, presentation-level and hidden layer features. Aiming at the problem that most coding methods only rely on Euclidean distance to constrain the relationship between descriptor-level features, the constraints of the difference in length and cosine of angle between visual word and local feature are designed to construct the loss function to obtain the length and angle constrained linear coding (LACLC) method. In order to improve the recognizability of the representation-level features, the spatial distribution between the encoded features around each cluster center is considered. Hierarchical weighting and LACLC are jointly applied to the distribution to construct aggregated word group feature (AWGF). At the same time, the constraint form of correntropy is changed according to the principle of constructing constraints in LACLC. The hidden layer features are combined with new constraint forms to construct double constrained extreme learning machine (DC-ELM), which improves the classification performance of the network while avoiding iterative training of correntropy weight. In order to verify the feasibility of DC-BoW, experiments are conducted on KTH, Olympic Sports, UCF11, Hollywood2 and UCF101 datasets. Experimental results show that the proposed DC-BoW can further utilize the spatial distribution information between features to obtain excellent recognition accuracy compared with other improved methods based on BoW.

Context and Locality Constrained Linear Coding for Human Action Recognition.

Learning Visual Context for Group Activity Recognition.

Human Action Recognition with Contextual Constraints Using a RGB-D Sensor

A Compact Representation of Human Actions by Sliding Coordinate Coding

Exploring Probabilistic Localized Video Representation for Human Action Recognition

Double constrained bag of words for human action recognition

Spatio-temporal Laplacian Pyramid Coding for Action Recognition.

Modeling Geometric-Temporal Context with Directional Pyramid Co-Occurrence for Action Recognition

Constructing Visual Vocabularies Using Sparse Coding for Action Recognition

Sequential Bag-of-Words Model for Human Action Classification.

View-invariant action recognition based on local linear dynamical system

Human action classification based on sequential bag-of-words model

Action recognition in still images using a combination of human pose and context information

B2C-AFM: Bi-Directional Co-Temporal and Cross-Spatial Attention Fusion Model for Human Action Recognition.

Fusing $${\mathcal {R}}$$R Features and Local Features with Context-Aware Kernels for Action Recognition

Robust 3D Action Recognition Through Sampling Local Appearances and Global Distributions.

Learning spatio-temporal co-occurrence correlograms for efficient human action classification

Efficient Spatialtemporal Context Modeling for Action Recognition

A novel hierarchical Bag-of-Words model for compact action representation.

ContextLoc++: A Unified Context Model for Temporal Action Localization

Spatio-Temporal Proximity Distribution Kernels for Action Recognition