Abstract:Data-driven modeling of human motions is ubiquitous in computer graphics and computer vision applications, such as synthesizing realistic motions or recognizing actions. Recent research has shown that such problems can be approached by learning a natural motion manifold using deep learning to address the shortcomings of traditional data-driven approaches. However, previous methods can be sub-optimal for two reasons. First, the skeletal information has not been fully utilized for feature extraction. Unlike images, it is difficult to define spatial proximity in skeletal motions in the way that deep networks can be applied. Second, motion is time-series data with strong multi-modal temporal correlations. A frame could be followed by several candidate frames leading to different motions; long-range dependencies exist where a number of frames in the beginning correlate to a number of frames later. Ineffective modeling would either under-estimate the multi-modality and variance, resulting in featureless mean motion or over-estimate them resulting in jittery motions. In this paper, we propose a new deep network to tackle these challenges by creating a natural motion manifold that is versatile for many applications. The network has a new spatial component for feature extraction. It is also equipped with a new batch prediction model that predicts a large number of frames at once, such that long-term temporally-based objective functions can be employed to correctly learn the motion multi-modality and variances. With our system, long-duration motions can be predicted/synthesized using an open-loop setup where the motion retains the dynamics accurately. It can also be used for denoising corrupted motions and synthesizing new motions with given control signals. We demonstrate that our system can create superior results comparing to existing work in multiple applications.

Human 3D Motion Recognition Based on Spatial-Temporal Context of Joints

Human motion segmentation using collaborative representations of 3D skeletal sequences.

Action Recognition Based on Global Optimal Similarity Measuring

Combining Adaptive Hierarchical Depth Motion Maps with Skeletal Joints for Human Action Recognition

Full Body Tracking-Based Human Action Recognition

An effective representation for action recognition with human skeleton joints

Human 3D Model-based 2D Action Recognition

Human Action Recognition with Contextual Constraints Using a RGB-D Sensor

Action and Gait Recognition from Recovered 3-D Human Joints

Marker-Less 3d Human Motion Capture With Monocular Image Sequence And Height-Maps

Long-Term Human Motion Prediction with Scene Context

3d Body Joints-Based Human Action Recognition

Multimodal human action recognition based on spatio-temporal action representation recognition model

3D Skeletal Gesture Recognition Via Hidden States Exploration

Using the Representation of 3D Skeleton Snippet for Human Action Recognition

A Spatial-temporal 3D Human Pose Reconstruction Framework

Investigating Pose Representations and Motion Contexts Modeling for 3D Motion Prediction

Explicit Spatiotemporal Joint Relation Learning for Tracking Human Pose

Spatio-temporal Manifold Learning for Human Motions via Long-horizon Modeling

A New Representation of Skeleton Sequences for 3D Action Recognition

Action Recognition Based on Joint Trajectory Maps with Convolutional Neural Networks