Abstract:3D human motion prediction; predicting future human poses in the basis of historically observed motion sequences, is a core task in computer vision. Thus far, it has been successfully applied to both autonomous driving and human–robot interaction. Previous research work has usually employed Recurrent Neural Networks (RNNs)-based models to predict future human poses. However, as previous works have amply demonstrated, RNN-based prediction models suffer from unrealistic and discontinuous problems in human motion prediction due to the accumulation of prediction errors. To address this, we propose a feed-forward, 3D skeleton-based model for human motion prediction. This model, the Spatial–Temporal Graph Convolutional Network (ST-GCN) model, automatically learns the spatial and temporal patterns of human motion from input sequences. This model overcomes the limitations of previous research approaches. Specifically, our ST-GCN model is based on an encoder-decoder architecture. The encoder consists of 5 ST-GCN modules, with each ST-GCN module consisting of a spatial GCN layer and a 2D convolution-based TCN layer, which facilitate the encoding of the spatio-temporal dynamics of human motion. Subsequently, the decoder, consisting of 5 TCN layers, exploits the encoded spatio-temporal representation of human motion to predict future human pose. We leveraged the ST-GCN model to perform extensive experiments on various large-scale human activity 3D pose datasets (Human3.6 M, AMASS, 3DPW) while adopting MPJPE (Mean Per Joint Position Error) as the evaluation metric. The experimental results demonstrate that our ST-GCN model outperforms the baseline models in both short-term (< 400 ms) and long-term (> 400 ms) predictions, thus yielding the best prediction results.

Predicting Long-Term Skeletal Motions by a Spatio-Temporal Hierarchical Recurrent Network.

Temporal Constrained Feasible Subspace Learning for Human Pose Forecasting

A Spatio-Temporal Transformer Network for Human Motion Prediction in Human-Robot Collaboration

3D Skeleton-Based Human Motion Prediction Using Spatial–temporal Graph Convolutional Network

Human Action Prediction Based On Skeleton Data

Multitask Non-Autoregressive Model For Human Motion Prediction

Long-Term Human Motion Prediction by Modeling Motion Context and Enhancing Motion Dynamic.

Multiscale Spatial and Temporal Learning for Human Motion Prediction

Investigating Pose Representations and Motion Contexts Modeling for 3D Motion Prediction

Spatiotemporal Co-attention Recurrent Neural Networks for Human-Skeleton Motion Prediction

An Attractor-Guided Neural Networks for Skeleton-Based Human Motion Prediction

Learning Dynamic Relationships for 3D Human Motion Prediction

High-Quality Human Motion Prediction Using Size Invariant Motion Space

Multiscale Spatio-Temporal Graph Neural Networks for 3D Skeleton-Based Motion Prediction

Spatio-Temporal Branching for Motion Prediction using Motion Increments

Human motion prediction with gated recurrent unit model of multi-dimensional input

Directed Acyclic Graph Neural Network for Human Motion Prediction

Parallel multi-stage rectification networks for 3D skeleton-based motion prediction

DMS-GCN: Dynamic Mutiscale Spatiotemporal Graph Convolutional Networks for Human Motion Prediction

Learning Snippet-to-Motion Progression for Skeleton-based Human Motion Prediction

Toward Realistic 3D Human Motion Prediction with a Spatio-Temporal Cross- Transformer Approach