Abstract:Extreme learning machine (ELM) is a fast and efficient classifier. Due to the inability to process descriptor-level features extracted from video sequences, the networks based on ELM cannot be directly used to recognize human actions. Encoding learning network (ELN) is proposed to solve this problem. The network is composed of feature encoding module and double similarity-constrained extreme learning machine (DS-ELM). In feature encoding module, the sparse mapping weight matrix is combined with pyramid pooling to generate representation-level features. DS-ELM is used to classify generated features. In order to utilize the similarity information between the features of each layer, different weight matrices in ELN are separately trained to improve the recognition ability. In the training of sparse mapping weight matrix, the auto-encoded dictionary and similarity constrained linear coding (SCLC) method are proposed to encode the desired output. The sparse mapping weight matrix is trained by using partial descriptor features and corresponding desired outputs. In the training of the classification weights, the ELM objective function is updated by similarity relationship between hidden layer features to derive the training formula of DS-ELM, which improves the classification performance while avoiding iterative training. To verify the feasibility of the ELN, experiments are conducted on Olympic Sports, UCF11, Hollywood2, UCF101, and Self-collection databases. Experimental results show that the proposed ELN is able to directly process descriptor features. And, the similarity information between the features of each layer can be further utilized by ELN to obtain excellent recognition performance compared with other improved methods based on ELM.

Learning Explicit Shape And Motion Evolution Maps For Skeleton-Based Human Action Recognition

Fusing Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks

Skeleton edge motion networks for human action recognition

Action Recognition Based on Global Optimal Similarity Measuring

Explorations of Skeleton Features for LSTM-based Action Recognition

Spatial-Temporal Data Augmentation Based on LSTM Autoencoder Network for Skeleton-Based Human Action Recognition

Skeleton-based Action Recognition Using LSTM and CNN

Recognizing Human Actions As the Evolution of Pose Estimation Maps

Skeleton-Based Human Action Recognition with Global Context-Aware Attention LSTM Networks

Human Action Recognition From Digital Videos Based on Deep Learning.

Learning to Recognize 3D Human Action from A New Skeleton-based Representation Using Deep Convolutional Neural Networks

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

Semi‐supervised Long Short‐term Memory for Human Action Recognition

Lattice Long Short-Term Memory for Human Action Recognition

Encoding Learning Network Combined with Feature Similarity Constraints for Human Action Recognition

DB-LSTM: Densely-connected Bi-directional LSTM for Human Action Recognition

An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data

Deep spatiotemporal LSTM network with temporal pattern feature for 3D human action recognition

Skeleton-based action recognition with hierarchical spatial reasoning and temporal stack learning network

Representation Learning of Temporal Dynamics for Skeleton-Based Action Recognition

Fusing Shape and Motion Matrices for View Invariant Action Recognition Using 3D Skeletons