Abstract:3D human motion prediction; predicting future human poses in the basis of historically observed motion sequences, is a core task in computer vision. Thus far, it has been successfully applied to both autonomous driving and human–robot interaction. Previous research work has usually employed Recurrent Neural Networks (RNNs)-based models to predict future human poses. However, as previous works have amply demonstrated, RNN-based prediction models suffer from unrealistic and discontinuous problems in human motion prediction due to the accumulation of prediction errors. To address this, we propose a feed-forward, 3D skeleton-based model for human motion prediction. This model, the Spatial–Temporal Graph Convolutional Network (ST-GCN) model, automatically learns the spatial and temporal patterns of human motion from input sequences. This model overcomes the limitations of previous research approaches. Specifically, our ST-GCN model is based on an encoder-decoder architecture. The encoder consists of 5 ST-GCN modules, with each ST-GCN module consisting of a spatial GCN layer and a 2D convolution-based TCN layer, which facilitate the encoding of the spatio-temporal dynamics of human motion. Subsequently, the decoder, consisting of 5 TCN layers, exploits the encoded spatio-temporal representation of human motion to predict future human pose. We leveraged the ST-GCN model to perform extensive experiments on various large-scale human activity 3D pose datasets (Human3.6 M, AMASS, 3DPW) while adopting MPJPE (Mean Per Joint Position Error) as the evaluation metric. The experimental results demonstrate that our ST-GCN model outperforms the baseline models in both short-term (< 400 ms) and long-term (> 400 ms) predictions, thus yielding the best prediction results.

A Non-autoregressive Decoding Model Based on Joint Classification for 3D Human Pose Regression.

Semantic Graph Convolutional Networks for 3D Human Pose Regression

A residual semantic graph convolutional network with high-resolution representation for 3D human pose estimation in a virtual fashion show

Symbolism and Directivity of Joint Keypoints in Temporal and Spatial Dimensions in Human Pose Prediction with GCN-Based Model

Joint Coordinate Regression and Association For Multi-Person Pose Estimation, A Pure Neural Network Approach

Attention Residual Network with 3D convolutional neural network for 3D Human Pose Estimation.

3D Human Pose Estimation Using Improved Semantic Graph Convolutional Based on Fusing Non-local Neural Network and Multi-Head Attention

Multitask Non-Autoregressive Model For Human Motion Prediction

An Improved 3D Human Pose Estimation Model Based on Temporal Convolution with Gaussian Error Linear Units

Enhanced Spatial–temporal Dynamics in Pose Forecasting Through Multi-Graph Convolution Networks

Geometric algebra-based multiscale encoder-decoder networks for 3D motion prediction

Multi-Graph Convolution Network for Pose Forecasting

HIDE:Hierarchical Iterative Decoding Enhancement for Multi-view 3D Human Parameter Regression

Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos

3D Human Pose Estimation in Motion Based on Multi-Stage Regression

A Hierarchical Static-Dynamic Encoder-Decoder Structure for 3D Human Motion Prediction with Residual CNNs

Three-dimensional human pose estimation based on improved semantic graph convolution neural networks

Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training

PoseGTAC: Graph Transformer Encoder-Decoder with Atrous Convolution for 3D Human Pose Estimation

3D Skeleton-Based Human Motion Prediction Using Spatial–temporal Graph Convolutional Network

3D Human Pose Estimation Based on 2D-3D Consistency with Synchronized Adversarial Training