Abstract:3D human motion prediction; predicting future human poses in the basis of historically observed motion sequences, is a core task in computer vision. Thus far, it has been successfully applied to both autonomous driving and human–robot interaction. Previous research work has usually employed Recurrent Neural Networks (RNNs)-based models to predict future human poses. However, as previous works have amply demonstrated, RNN-based prediction models suffer from unrealistic and discontinuous problems in human motion prediction due to the accumulation of prediction errors. To address this, we propose a feed-forward, 3D skeleton-based model for human motion prediction. This model, the Spatial–Temporal Graph Convolutional Network (ST-GCN) model, automatically learns the spatial and temporal patterns of human motion from input sequences. This model overcomes the limitations of previous research approaches. Specifically, our ST-GCN model is based on an encoder-decoder architecture. The encoder consists of 5 ST-GCN modules, with each ST-GCN module consisting of a spatial GCN layer and a 2D convolution-based TCN layer, which facilitate the encoding of the spatio-temporal dynamics of human motion. Subsequently, the decoder, consisting of 5 TCN layers, exploits the encoded spatio-temporal representation of human motion to predict future human pose. We leveraged the ST-GCN model to perform extensive experiments on various large-scale human activity 3D pose datasets (Human3.6 M, AMASS, 3DPW) while adopting MPJPE (Mean Per Joint Position Error) as the evaluation metric. The experimental results demonstrate that our ST-GCN model outperforms the baseline models in both short-term (< 400 ms) and long-term (> 400 ms) predictions, thus yielding the best prediction results.

Towards Efficient 3D Human Motion Prediction Using Deformable Transformer-based Adversarial Network

A Spatio-Temporal Transformer Network for Human Motion Prediction in Human-Robot Collaboration

Toward Realistic 3D Human Motion Prediction with a Spatio-Temporal Cross- Transformer Approach

Towards Realistic 3D Human Motion Prediction with A Spatio-temporal Cross-transformer Approach

Forecasting Distillation: Enhancing 3D Human Motion Prediction with Guidance Regularization

AdvMT: Adversarial Motion Transformer for Long-term Human Motion Prediction

STTG-net: a Spatio-temporal Network for Human Motion Prediction Based on Transformer and Graph Convolution Network

Robust Human Motion Forecasting using Transformer-based Model

3D Human Motion Prediction Based on Graph Convolution Network and Transformer

STN-enhanced Message Passing Guided by Adversarial Learning for Human Pose Estimation

3D Skeleton-Based Human Motion Prediction Using Spatial–temporal Graph Convolutional Network

High-Quality Human Motion Prediction Using Size Invariant Motion Space

KD-Former: Kinematic and dynamic coupled transformer network for 3D human motion prediction

TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction

STAFFormer: Spatio-temporal Adaptive Fusion Transformer for Efficient 3D Human Pose Estimation

Joint-Aware Transformer: An Inter-Joint Correlation Encoding Transformer for Short-Term 3D Human Motion Prediction

Efficient Human Motion Prediction Using Temporal Convolutional Generative Adversarial Network

A Mixture of Experts Approach to 3D Human Motion Prediction

Spatial–temporal modeling for prediction of stylized human motion

DSTFormer: 3D Human Pose Estimation with a Dual-scale Spatial and Temporal Transformer Network

3D Human motion anticipation and classification