Abstract:Multi-person 3D motion prediction is an emerging task that involves predicting the future 3D motion of multiple individuals based on current observations. In contrast to motion prediction for a single person, this task requires a strong emphasis on learning the interacting dynamics among multiple individuals. Broadly speaking, current methods can be categorized into two groups: The first group involves the straight-forward adaptation of models originally developed for single-person scenarios to multi-person scenarios, which is evidently suboptimal. The second group focuses on utilizing off-the-shelf tools like graph convolutional networks to model interactions. While this approach has shown improved results, the interactions primarily consider entire human identities rather than finer details. This motivates the introduction of our novel solution to address this limitation and enhance the task’s performance. In this work, we strive to craft a novel framework that can effectively address two key issues ignored in previous works, namely the multi-granularity interaction and time-varying inter-person dynamics. In implementation in accord with above aims, the proposed model has mainly comprised two modules: a person-level interaction module and a part-level interaction module. The former is designed to learn the holistic and dynamic interaction among multiple persons in a coarse-grained sense. Critically, we would emphasize that a unique trait of the former module is learning temporal dynamics. For example, it recognizes that two individuals exhibit a strong correlation during handshaking but less correlation after parting ways. The latter part-level interaction module learns the interaction between the body joints of different persons. This module operates at a more fine-grained level, distinguishing it from existing approaches. By aggregating information from both granularities, our model enables accurate motion prediction. To validate the effectiveness of the proposed model, we conducted comprehensive experiments on three benchmark datasets: 3DPW, CMU-Mocap, and MuPoTS-3D. The results of these evaluations unequivocally demonstrate the empirical superiority of our model compared to previous state-of-the-art methods.

MUP: Multi-granularity Unified Perception for Panoramic Activity Recognition

MPT-PAR:Mix-Parameters Transformer for Panoramic Activity Recognition

M&M: Recognizing Multiple Co-evolving Activities from Multi-source Videos

Multi-Granularity Interaction for Multi-Person 3D Motion Prediction

Hierarchical Multi-View Aggregation Network for Sensor-Based Human Activity Recognition.

AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition

Behavior Recognition Based on the Integration of Multigranular Motion Features

Where to Look: Multi-Granularity Occlusion Aware for Video Person Re-Identification

UMC: A Unified Bandwidth-efficient and Multi-resolution based Collaborative Perception Framework

Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph

FLOW: Fusing and Shuffling Global and Local Views for Cross-User Human Activity Recognition with IMUs

Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition

Multitask Multigranularity Aggregation With Global-Guided Attention for Video Person Re-Identification

Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition

Action Recognition By Learning Deep Multi-Granular Spatio-Temporal Video Representation

Action Recognition with Multi-stream Motion Modeling and Mutual Information Maximization

Multi-granular inter-frame relation exploration and global residual embedding for video-based person re-identification

Action Recognition by Hierarchical Mid-level Action Elements

Detecting Group Activities with Multi-Camera Context

PKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action Understanding.

MMTSA: Multimodal Temporal Segment Attention Network for Efficient Human Activity Recognition