Abstract:Human action recognition in videos is useful for many applications. However, there still exist huge challenges in real applications due to the variations in the appearance, lighting condition and viewing angle, of the subjects. In this consideration, depth data have advantages over red, green, blue (RGB) data because of their spatial information about the distance between object and viewpoint. Unlike existing works, we utilize the 3-D point cloud, which contains points in the 3-D real-world coordinate system to represent the external surface of human body. Specifically, we propose a new robust feature, the body surface context (BSC), by describing the distribution of relative locations of the neighbors for a reference point in the point cloud in a compact and descriptive way. The BSC encodes the cylindrical angular of the difference vector based on the characteristics of human body, which increases the descriptiveness and discriminability of the feature. As the BSC is an approximate object-centered feature, it is robust to transformations including translations and rotations, which are very common in real applications. Furthermore, we propose three schemes to represent human actions based on the new feature, including the skeleton-based scheme, the random-reference-point scheme, and the spatial-temporal scheme. In addition, to evaluate the proposed feature, we construct a human action dataset by a depth camera. Experiments on three datasets demonstrate that the proposed feature outperforms RGB-based features and other existing depth-based features, which validates that the BSC feature is promising in the field of human action recognition.

Combining RGB and Depth Features for Action Recognition Based on Sparse Representation.

Combining depth-skeleton feature with sparse coding for action recognition.

Fusion of Skeleton and RGB Features for RGB-D Human Action Recognition

Robust action recognition via borrowing information across video modalities

RGB-D action recognition using linear coding.

Action Recognition in Depth Video from RGB Perspective: A Knowledge Transfer Manner

Joint Deep Learning for RGB-D Action Recognition

Action Feature Representation and Recognition Based on Depth Video

Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos

Action Recognition from Depth Sequences Using Weighted Fusion of 2D and 3D Auto-Correlation of Gradients Features

RGB-D Human Action Recognition of Deep Feature Enhancement and Fusion Using Two-Stream ConvNet

Action Recognition Based on Adaptive Fusion of RGB and Skeleton Features

Body Surface Context: A New Robust Feature for Action Recognition from Depth Videos

RGB-D Based Action Recognition with Light-weight 3D Convolutional Networks

Action Recognition Based on 3D Skeleton and RGB Frame Fusion

Multi-dimension Feature Fusion for Action Recognition

Fusion of Skeletal and STIP-Based Features for Action Recognition with RGB-D Devices.

Skeleton Sequence and RGB Frame Based Multi-Modality Feature Fusion Network for Action Recognition

Multimodal feature fusion model for rgb-d action recognition

Action Recognition Using Action Sequences Optimization and Two-Stream 3D Dilated Neural Network.

Using a Selective Ensemble Support Vector Machine to Fuse Multimodal Features for Human Action Recognition