Abstract:Action recognition is an enabling technology for many real world applications, such as human-computer interaction, surveillance, video retrieval, retirement home monitoring, and robotics. In the past decade, it has attracted a great amount of interest in the research community. Recently, the commoditization of depth sensors has generated much excitement in action recognition from depth sensors. New depth sensor technology has enabled many applications that were not feasible before. On one hand, action recognition becomes far easier with depth sensors. On the other hand, the drive to recognize more complex actions presents new challenges. One crucial aspect of action recognition is to extract discriminative features. The depth maps have completely different characteristics from the RGB images. Directly applying features designed for RGB images does not work. Complex actions usually involve complicated temporal structures, human-object interactions, and person-person contacts. New machine learning algorithms need to be developed to learn these complex structures. This work enables the reader to quickly familiarize themselves with the latest research in depth-sensor based action recognition, and to gain a deeper understanding of recently developed techniques. It will be of great use for both researchers and practitioners who are interested in human action recognition with depth sensors. The text focuses on feature representation and machine learning algorithms for action recognition from depth sensors. After presenting a comprehensive overview of the state of the art in action recognition from depth data, the authors then provide in-depth descriptions of their recently developed feature representations and machine learning techniques, including lower-level depth and skeleton features, higher-level representations to model the temporal structure and human-object interactions, and feature selection techniques for occlusion handling.

Local Mean Spatio-Temporal Feature for Depth Image-Based Speed-Up Action Recognition.

3D Action Recognition Using Multi-Temporal Depth Motion Maps and Fisher Vector

Learning SpatioTemporal and Motion Features in a Unified 2D Network for Action Recognition

Multi-Temporal Depth Motion Maps-Based Local Binary Patterns for 3-D Human Action Recognition

Exploring 3d Human Action Recognition: From Offline To Online

Local feature coding for action recognition using RGB-D camera

Action Recognition from Depth Sequences Using Weighted Fusion of 2D and 3D Auto-Correlation of Gradients Features

RGB-D action recognition using linear coding.

Human Action Recognition Using Multi-Velocity STIPs and Motion Energy Orientation Histogram.

Local feature extraction using time domain information for human action recognition

Real-Time Human Action Recognition System Using Depth Map Sequences

The Multidimensional Motion Features of Spatial Depth Feature Maps: An Effective Motion Information Representation Method for Video-Based Action Recognition

Local Feature Analysis for real-time Action Recognition.

Action-Stage Emphasized Spatiotemporal VLAD for Video Action Recognition

Human action recognition using Adaptive Hierarchical Depth Motion Maps and Gabor filter

Human Action Recognition with Depth Cameras

Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition

Learning Discriminative Features for Fast Frame-Based Action Recognition.

Real-Time Human Action Recognition Using DMMs-Based LBP and EOH Features

Action Segmentation And Recognition Based On Depth Hog And Probability Distribution Difference

Recognizing actions using depth motion maps-based histograms of oriented gradients