Abstract:3D human motion prediction; predicting future human poses in the basis of historically observed motion sequences, is a core task in computer vision. Thus far, it has been successfully applied to both autonomous driving and human–robot interaction. Previous research work has usually employed Recurrent Neural Networks (RNNs)-based models to predict future human poses. However, as previous works have amply demonstrated, RNN-based prediction models suffer from unrealistic and discontinuous problems in human motion prediction due to the accumulation of prediction errors. To address this, we propose a feed-forward, 3D skeleton-based model for human motion prediction. This model, the Spatial–Temporal Graph Convolutional Network (ST-GCN) model, automatically learns the spatial and temporal patterns of human motion from input sequences. This model overcomes the limitations of previous research approaches. Specifically, our ST-GCN model is based on an encoder-decoder architecture. The encoder consists of 5 ST-GCN modules, with each ST-GCN module consisting of a spatial GCN layer and a 2D convolution-based TCN layer, which facilitate the encoding of the spatio-temporal dynamics of human motion. Subsequently, the decoder, consisting of 5 TCN layers, exploits the encoded spatio-temporal representation of human motion to predict future human pose. We leveraged the ST-GCN model to perform extensive experiments on various large-scale human activity 3D pose datasets (Human3.6 M, AMASS, 3DPW) while adopting MPJPE (Mean Per Joint Position Error) as the evaluation metric. The experimental results demonstrate that our ST-GCN model outperforms the baseline models in both short-term (< 400 ms) and long-term (> 400 ms) predictions, thus yielding the best prediction results.

Geometric algebra-based multiscale encoder-decoder networks for 3D motion prediction

Forecasting Distillation: Enhancing 3D Human Motion Prediction with Guidance Regularization

DMS-GCN: Dynamic Mutiscale Spatiotemporal Graph Convolutional Networks for Human Motion Prediction

Geometric algebra-based multiview interaction networks for 3D human motion prediction

Multiscale Spatial and Temporal Learning for Human Motion Prediction

3D Skeleton-Based Human Motion Prediction Using Spatial–temporal Graph Convolutional Network

Directed Acyclic Graph Neural Network for Human Motion Prediction

Multiscale Spatio-Temporal Graph Neural Networks for 3D Skeleton-Based Motion Prediction

AGVNet: Attention Guided Velocity Learning for 3D Human Motion Prediction

An initial prediction and fine-tuning model based on improving GCN for 3D human motion prediction

Towards Accurate 3D Human Motion Prediction from Incomplete Observations

A Hierarchical Static-Dynamic Encoder-Decoder Structure for 3D Human Motion Prediction with Residual CNNs

Adversarial Geometry-Aware Human Motion Prediction

MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction

MFOGCN: Multi-Feature-based Orthogonal Graph Convolutional Network for 3D Human Motion Prediction

Parallel multi-stage rectification networks for 3D skeleton-based motion prediction

Gradient multi-foci networks for 3D skeleton-based human motion prediction

Human Motion Prediction Based on Space-Time-Separable Graph Convolutional Network

High-Quality Human Motion Prediction Using Size Invariant Motion Space

Multi-Graph Convolution Network for Pose Forecasting