Abstract:Although the depth map sequence widely used in behavior recognition can provide depth information. However, depth pixels are not strongly correlated with each other, and the spatio-temporal structure information of behavior data is largely lost. Point cloud data can provide rich spatial information and geometric features, which make up for the lack of depth images. In order to further utilize the geometric information of behavior action and improve the utilization rate of spatio-temporal structure information, this paper proposed a 4D strong spatio-temporal feature learning network for behavior recognition of point cloud sequences. Coordinate transformation was performed on a depth dataset to generate a point cloud dataset, then our network processed each frame of point cloud data and learned 4D strong spatio-temporal features (three spatial and one temporal dimension). The network consists of two modules, a spatial-level feature learning module and a temporal-level position encoding module. In the spatial-level feature learning module, the spatial dimension of the point cloud is processed and learned. Each frame of point cloud data outputs a feature sequence through two progressive structure enhanced set abstract layers, which represents the strong spatial structure. Then, it becomes a complete spatial-level feature sequence through a maxing pooling operation. In the temporal-level position coding module, the processing and learning of the time dimension of the point cloud are performed. The time-series information is injected into the feature sequence through position coding and so on. Finally, the multi-level features of human actions are aggregated and classified. It was carried out on three public datasets. Extensive experiments showed that the network structure proposed in this paper outperformed the current state-of-the-art methods.

Spatiotemporal Learning of Dynamic Gestures from 3D Point Cloud Data

2D Motion Detection Bounded Hand 3D Trajectory Tracking and Gesture Recognition under Complex Background

Gesture recognition based on deep deformable 3D convolutional neural networks

Gesture Recognition with a 3-D Accelerometer

Hand Gesture Recognition Using Appearance Features Based on 3D Point Cloud

Space-Time Event Clouds for Gesture Recognition: From RGB Cameras to Event Cameras

A 4D strong spatio-temporal feature learning network for behavior recognition of point cloud sequences

3D dynamic hand gestures recognition using the Leap Motion sensor and convolutional neural networks

RGC: Reliable Gesture Classification Via Wearables Using GANs-Based Data Augmentation.

Short-Term Temporal Convolutional Networks for Dynamic Hand Gesture Recognition

Computer Vision-Based Real-Time 3d Gesture Recognition Using Depth Image

A PointNet-Based Solution for 3D Hand Gesture Recognition

A Dynamic 3D Point Cloud Dataset for Immersive Applications.

Selection of Large-Scale 3D Point Cloud Data Using Gesture Recognition

Multimodal Spatiotemporal Feature Map for Dynamic Gesture Recognition

Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences

Spatiotemporal features representation with dynamic mode decomposition for hand gesture recognition using deep neural networks

3D Gesture Recognition Method Based on Faster R-CNN Network.

Anchor-Based Spatio-Temporal Attention 3D Convolutional Networks for Dynamic 3D Point Cloud Sequences

GesID: 3D Gesture Authentication Based on Depth Camera and One-Class Classification

Attentive 3D-Ghost Module for Dynamic Hand Gesture Recognition with Positive Knowledge Transfer