Abstract:The visual recognition of complex, articulated human movements is fundamental for a wide range of artificial systems oriented toward human-robot communication, action classification, and action-driven perception. These challenging tasks may generally involve the processing of a huge amount of visual information and learning-based mechanisms for generalizing a set of training actions and classifying new samples. To operate in natural environments, a crucial property is the efficient and robust recognition of actions, also under noisy conditions caused by, for instance, systematic sensor errors and temporarily occluded persons. Studies of the mammalian visual system and its outperforming ability to process biological motion information suggest separate neural pathways for the distinct processing of pose and motion features at multiple levels and the subsequent integration of these visual cues for action perception. We present a neurobiologically-motivated approach to achieve noise-tolerant action recognition in real time. Our model consists of self-organizing Growing When Required (GWR) networks that obtain progressively generalized representations of sensory inputs and learn inherent spatio-temporal dependencies. During the training, the GWR networks dynamically change their topological structure to better match the input space. We first extract pose and motion features from video sequences and then cluster actions in terms of prototypical pose-motion trajectories. Multi-cue trajectories from matching action frames are subsequently combined to provide action dynamics in the joint feature space. Reported experiments show that our approach outperforms previous results on a dataset of full-body actions captured with a depth sensor, and ranks among the best results for a public benchmark of domestic daily actions.

Human Action Recognition by Imitating the Simple Cells of Visual Cortex

Based on cluster tree human action recognition algorithm for monocular video

Human Action Recognition Using Deep Learning Methods.

Human Action Recognition From Digital Videos Based on Deep Learning.

Self-organizing neural integration of pose-motion features for human action recognition

Human Action Recognition Based on Three-Stream Network with Frame Sequence Features

Human Action Recognition with Contextual Constraints Using a RGB-D Sensor

A fast human action recognition network based on spatio-temporal features

Oriented Gradients for Human Action Recognition.

Human action recognition in videos using motion impression image

Human Action Recognition Using Sparse Representation

Human action recognition using Adaptive Hierarchical Depth Motion Maps and Gabor filter

Deep Learning-Based Human Action Recognition in Videos

Human Action Recognition Based on Hierarchical Multi-Scale Adaptive Conv-Long Short-Term Memory Network

Human action recognition via multi-view learning.

Embedding Motion and Structure Features for Action Recognition

Action Recognition in Videos through a Transfer-Learning-Based Technique

Multimodal human action recognition based on spatio-temporal action representation recognition model

Complex Human Action Recognition Using a Hierarchical Feature Reduction and Deep Learning-Based Method

Recognizing actions using depth motion maps-based histograms of oriented gradients

DB-LSTM: Densely-connected Bi-directional LSTM for Human Action Recognition