Abstract:Existing Graph Convolutional Networks to achieve human motion prediction largely adopt a one-step scheme, which output the prediction straight from history input, failing to exploit human motion patterns. We observe that human motions have transitional patterns and can be split into snippets representative of each transition. Each snippet can be reconstructed from its starting and ending poses referred to as the transitional poses. We propose a snippet-to-motion multi-stage framework that breaks motion prediction into sub-tasks easier to accomplish. Each sub-task integrates three modules: transitional pose prediction, snippet reconstruction, and snippet-to-motion prediction. Specifically, we propose to first predict only the transitional poses. Then we use them to reconstruct the corresponding snippets, obtaining a close approximation to the true motion sequence. Finally we refine them to produce the final prediction output. To implement the network, we propose a novel unified graph modeling, which allows for direct and effective feature propagation compared to existing approaches which rely on separate space-time modeling. Extensive experiments on Human 3.6M, CMU Mocap and 3DPW datasets verify the effectiveness of our method which achieves state-of-the-art performance.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that most of the existing human motion prediction methods based on Graph Convolutional Networks (GCNs) adopt a single - step scheme, directly predicting results from historical inputs and outputs, and failing to fully utilize human motion patterns. Specifically, although existing GCNs perform well in capturing spatial relationships within a single frame or graph, they lack an explicit mechanism to effectively model the temporal evolution process of human motion. These methods usually rely on information aggregation between different frames without explicitly considering the temporal context, for example, by using Temporal Convolutional Networks (TCNs) along the time axis. This limitation hinders their ability to capture sequence patterns, subtle motion transitions, and fine - grained temporal dependencies in human motion. To solve this problem, the authors propose a new fragment - based method, which combines fragmented motion representations and a fragment - to - motion prediction framework. The core observation of this framework is that it is easier to predict several key postures than to predict the entire sequence, and human motion often exhibits a multi - stage pattern. Therefore, the paper proposes a phased framework that decomposes motion prediction into more tractable subtasks, each of which contains three modules: transition posture prediction, fragment reconstruction, and fragment - to - motion prediction. The specific steps are as follows: 1. **Transition Posture Prediction**: First, predict future transition points, that is, the specific transition postures of each sample. 2. **Fragment Reconstruction**: Then, use these transition postures to reconstruct the corresponding motion fragments through techniques such as linear interpolation to obtain an approximation close to the real motion sequence. 3. **Fragment - to - Motion Prediction**: Finally, assemble these fragments to generate the final predicted motion sequence and optimize it. To implement this framework, the authors propose a new unified graph modeling method that allows for direct and efficient feature propagation. Compared with existing methods, the latter rely on independent spatial and temporal modeling. Experimental results show that this method achieves state - of - the - art performance on multiple benchmark datasets (such as Human3.6M, CMU Mocap, and 3DPW). In summary, this paper aims to improve the accuracy and robustness of human motion prediction, especially in long - term prediction, by introducing fragmented motion representations and a multi - stage prediction framework.

Learning Snippet-to-Motion Progression for Skeleton-based Human Motion Prediction

Learning a Deep Motion Interpolation Network for Human Skeleton Animations

Multiscale Spatial and Temporal Learning for Human Motion Prediction

Learning Progressive Joint Propagation for Human Motion Prediction

Human Motion Prediction Based on Graph Convolutional Networks and Multilayer Perceptron

Multi-grained Trajectory Graph Convolutional Networks for Habit-unrelated Human Motion Prediction

3D Human Motion Prediction Based on Graph Convolution Network and Transformer

DMS-GCN: Dynamic Mutiscale Spatiotemporal Graph Convolutional Networks for Human Motion Prediction

Multiscale Residual Learning of Graph Convolutional Sequence Chunks for Human Motion Prediction

Learning Dynamic Relationships for 3D Human Motion Prediction

A Human-Like Action Learning Process: Progressive Pose Generation for Motion Prediction

3D Skeleton-Based Human Motion Prediction Using Spatial–temporal Graph Convolutional Network

A Spatio-Temporal Transformer Network for Human Motion Prediction in Human-Robot Collaboration

Skeleton-Parted Graph Scattering Networks for 3D Human Motion Prediction

Class-guided Human Motion Prediction Via Multi-Spatial-temporal Supervision

Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Motion Prediction

High-Quality Human Motion Prediction Using Size Invariant Motion Space

Few-shot Human Motion Prediction Via Learning Novel Motion Dynamics.

Human Action Prediction Based On Skeleton Data

Spatiotemporal Consistency Learning from Momentum Cues for Human Motion Prediction

Predicting Long-Term Skeletal Motions by a Spatio-Temporal Hierarchical Recurrent Network.