Abstract:Human motion modelling is a classical problem at the intersection of graphics and computer vision, with applications spanning human-computer interaction, motion synthesis, and motion prediction for virtual and augmented reality. Following the success of deep learning methods in several computer vision tasks, recent work has focused on using deep recurrent neural networks (RNNs) to model human motion, with the goal of learning time-dependent representations that perform tasks such as short-term motion prediction and long-term human motion synthesis. We examine recent work, with a focus on the evaluation methodologies commonly used in the literature, and show that, surprisingly, state-of-the-art performance can be achieved by a simple baseline that does not attempt to model motion at all. We investigate this result, and analyze recent RNN methods by looking at the architectures, loss functions, and training procedures used in state-of-the-art approaches. We propose three changes to the standard RNN models typically used for human motion, which result in a simple and scalable RNN architecture that obtains state-of-the-art performance on human motion prediction.

What problem does this paper attempt to address?

This paper aims to solve several key problems in human motion prediction. Specifically, it attempts to: 1. **Improve the performance of short - term motion prediction**: The existing deep recurrent neural network (RNN) methods have obvious discontinuity problems in short - term motion prediction, especially in the first frame of prediction. This makes these methods perform poorly in practical applications, such as visual tracking. The paper proposes a new method to solve this problem. By introducing the residual architecture and the sampling - based loss function, the prediction becomes smoother and has smaller errors. 2. **Reduce the complexity of hyper - parameter tuning**: Existing methods usually require complex hyper - parameter tuning, especially the setting of noise scheduling. This tuning is not only difficult to carry out, but may also affect the final performance of the model. The method proposed in the paper does not require additional hyper - parameter tuning, simplifying the model training process. 3. **Simplify the model structure**: Existing methods usually use multi - layer LSTM or SRNN. Although these models perform well on certain tasks, they are computationally expensive and difficult to train. The paper proposes to use a single - layer GRU and does not use a spatial encoding layer, thus greatly simplifying the model structure while maintaining or even improving the prediction performance. 4. **Explore the training of multi - action models**: Existing methods usually model specific actions, while the paper attempts to train a single model that can handle multiple actions. In this way, the model can better utilize the regularity in large - scale datasets and improve the overall prediction performance. ### Main contributions of the paper - **Proposed a new sequence - to - sequence (seq2seq) architecture**: This architecture uses a sampling - based loss function during the training process, enabling the model to better recover from its own mistakes during prediction and reducing prediction discontinuity. - **Introduced the residual architecture**: By adding residual connections between the input and output of each RNN unit, the model can better represent the continuity of motion, especially in the first frame of prediction. - **Simplified the model structure**: Using a single - layer GRU instead of multi - layer LSTM or SRNN not only reduces the computational cost but also improves the training efficiency of the model. - **Explored the training of multi - action models**: By training a single model that can handle multiple actions, the paper shows the potential of this method in improving prediction performance. ### Experimental results The paper verifies the effectiveness of the proposed method through a series of experiments: 1. **Sequence - to - sequence architecture and sampling - based loss**: The experimental results show that the sequence - to - sequence architecture using the sampling - based loss function performs comparably to existing methods in short - term motion prediction and generates more reasonable motion in long - term prediction. 2. **Residual architecture**: After introducing the residual architecture, the error of the model in short - term prediction is significantly reduced, and the prediction is smoother. 3. **Multi - action model**: Training a single model that can handle multiple actions not only improves the prediction performance but also shows the advantages of the model in handling large - scale datasets. In general, through proposing new architectures and methods, this paper effectively solves the deficiencies of existing methods in human motion prediction and provides new ideas for further research in this field.

On human motion prediction using recurrent neural networks

How Deep Neural Networks Understand Motion? Toward Interpretable Motion Modeling by Leveraging the Relative Change in Position

An Improved GRU Network for Human Motion Prediction

Long-Term Human Motion Prediction by Modeling Motion Context and Enhancing Motion Dynamic.

Human Motion Prediction Based on Graph Convolutional Networks and Multilayer Perceptron

Human motion prediction with gated recurrent unit model of multi-dimensional input

History Repeats Itself: Human Motion Prediction via Motion Attention

Multitask Non-Autoregressive Model For Human Motion Prediction

Probabilistic Human Motion Prediction Via A Bayesian Neural Network

Recurrent Neural Network for Motion Trajectory Prediction in Human-Robot Collaborative Assembly

RNN -Based Human Motion Prediction Via Differential Sequence Representation

3D Skeleton-Based Human Motion Prediction Using Spatial–temporal Graph Convolutional Network

Efficient Human Motion Prediction Using Temporal Convolutional Generative Adversarial Network

Parallel multi-stage rectification networks for 3D skeleton-based motion prediction

Multiscale Spatial and Temporal Learning for Human Motion Prediction

Velocity-to-velocity human motion forecasting

A multilayer human motion prediction perceptron by aggregating repetitive motion

Investigating Pose Representations and Motion Contexts Modeling for 3D Motion Prediction

KP-RNN: A Deep Learning Pipeline for Human Motion Prediction and Synthesis of Performance Art

Fusion learning-based recurrent neural network for human motion prediction

DMS-GCN: Dynamic Mutiscale Spatiotemporal Graph Convolutional Networks for Human Motion Prediction