Abstract:Multi-person 3D motion prediction is an emerging task that involves predicting the future 3D motion of multiple individuals based on current observations. In contrast to motion prediction for a single person, this task requires a strong emphasis on learning the interacting dynamics among multiple individuals. Broadly speaking, current methods can be categorized into two groups: The first group involves the straight-forward adaptation of models originally developed for single-person scenarios to multi-person scenarios, which is evidently suboptimal. The second group focuses on utilizing off-the-shelf tools like graph convolutional networks to model interactions. While this approach has shown improved results, the interactions primarily consider entire human identities rather than finer details. This motivates the introduction of our novel solution to address this limitation and enhance the task’s performance. In this work, we strive to craft a novel framework that can effectively address two key issues ignored in previous works, namely the multi-granularity interaction and time-varying inter-person dynamics. In implementation in accord with above aims, the proposed model has mainly comprised two modules: a person-level interaction module and a part-level interaction module. The former is designed to learn the holistic and dynamic interaction among multiple persons in a coarse-grained sense. Critically, we would emphasize that a unique trait of the former module is learning temporal dynamics. For example, it recognizes that two individuals exhibit a strong correlation during handshaking but less correlation after parting ways. The latter part-level interaction module learns the interaction between the body joints of different persons. This module operates at a more fine-grained level, distinguishing it from existing approaches. By aggregating information from both granularities, our model enables accurate motion prediction. To validate the effectiveness of the proposed model, we conducted comprehensive experiments on three benchmark datasets: 3DPW, CMU-Mocap, and MuPoTS-3D. The results of these evaluations unequivocally demonstrate the empirical superiority of our model compared to previous state-of-the-art methods.

Geometric algebra-based multiview interaction networks for 3D human motion prediction

Geometric algebra-based multiscale encoder-decoder networks for 3D motion prediction

Learning a Deep Motion Interpolation Network for Human Skeleton Animations

Multiscale Spatio-Temporal Graph Neural Networks for 3D Skeleton-Based Motion Prediction

DMS-GCN: Dynamic Mutiscale Spatiotemporal Graph Convolutional Networks for Human Motion Prediction

April-GCN: Adjacency Position-velocity Relationship Interaction Learning GCN for Human motion prediction

An Attractor-Guided Neural Networks for Skeleton-Based Human Motion Prediction

Relation Learning and Aggregate-attention for Multi-person Motion Prediction

Gradient multi-foci networks for 3D skeleton-based human motion prediction

Multi-Granularity Interaction for Multi-Person 3D Motion Prediction

AMHGCN: Adaptive multi-level hypergraph convolution network for human motion prediction

Enhanced Spatial–temporal Dynamics in Pose Forecasting Through Multi-Graph Convolution Networks

Multi-Graph Convolution Network for Pose Forecasting

Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos

An initial prediction and fine-tuning model based on improving GCN for 3D human motion prediction

MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction

Adaptive Spatial-Temporal Graph-Mixer for Human Motion Prediction

Symbiotic Graph Neural Networks for 3D Skeleton-based Human Action Recognition and Motion Prediction

Skeleton-Parted Graph Scattering Networks for 3D Human Motion Prediction

Spatio-Temporal Gating-Adjacency GCN for Human Motion Prediction