Graph-Guided MLP-Mixer for Skeleton-Based Human Motion Prediction

Xinshun Wang,Qiongjie Cui,Chen Chen,Shen Zhao,Mengyuan Liu

2023-08-07

Abstract:In recent years, Graph Convolutional Networks (GCNs) have been widely used in human motion prediction, but their performance remains unsatisfactory. Recently, MLP-Mixer, initially developed for vision tasks, has been leveraged into human motion prediction as a promising alternative to GCNs, which achieves both better performance and better efficiency than GCNs. Unlike GCNs, which can explicitly capture human skeleton's bone-joint structure by representing it as a graph with edges and nodes, MLP-Mixer relies on fully connected layers and thus cannot explicitly model such graph-like structure of human's. To break this limitation of MLP-Mixer's, we propose \textit{Graph-Guided Mixer}, a novel approach that equips the original MLP-Mixer architecture with the capability to model graph structure. By incorporating graph guidance, our \textit{Graph-Guided Mixer} can effectively capture and utilize the specific connectivity patterns within human skeleton's graph representation. In this paper, first we uncover a theoretical connection between MLP-Mixer and GCN that is unexplored in existing research. Building on this theoretical connection, next we present our proposed \textit{Graph-Guided Mixer}, explaining how the original MLP-Mixer architecture is reinvented to incorporate guidance from graph structure. Then we conduct an extensive evaluation on the Human3.6M, AMASS, and 3DPW datasets, which shows that our method achieves state-of-the-art performance.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in human motion prediction, although existing graph convolutional networks (GCNs) can explicitly capture the joint - bone structure of the human skeleton, their performance is still not satisfactory and there are challenges in dealing with long - term dependencies and temporal dynamics. Meanwhile, although the multi - layer perceptron mixer (MLP - Mixer) performs excellently in visual tasks and has shown better performance and efficiency than GCNs after being introduced into the field of human motion prediction, it cannot explicitly model the information of such graph structures. Therefore, the paper proposes a new method - Graph - Guided Mixer, aiming to combine the flexibility of MLP - Mixer with the graph - structure - capturing ability of GCNs to overcome the limitations of existing methods and improve the accuracy and efficiency of human motion prediction. Specifically, the paper first reveals the unexplored theoretical connection between MLP - Mixer and GCNs. This discovery makes it possible to inject graph - structure information into the MLP - Mixer architecture. Based on this theoretical connection, the paper proposes Graph - Guided Mixer, which enhances the MLP - Mixer's ability to effectively capture and utilize specific connection patterns in the human skeleton graph representation by using the graph structure as guidance. The paper also proves through extensive experiments on the Human3.6M, AMASS and 3DPW datasets that the proposed method achieves state - of - the - art performance.

Graph-Guided MLP-Mixer for Skeleton-Based Human Motion Prediction

M2AST:MLP-mixer-based adaptive spatial-temporal graph learning for human motion prediction

Adaptive Spatial-Temporal Graph-Mixer for Human Motion Prediction

Learning a Deep Motion Interpolation Network for Human Skeleton Animations

Skeleton-Parted Graph Scattering Networks for 3D Human Motion Prediction

GraphMLP: A Graph MLP-Like Architecture for 3D Human Pose Estimation

Multiscale Spatio-Temporal Graph Neural Networks for 3D Skeleton-Based Motion Prediction

Learning Snippet-to-Motion Progression for Skeleton-based Human Motion Prediction

DMS-GCN: Dynamic Mutiscale Spatiotemporal Graph Convolutional Networks for Human Motion Prediction

Dynamic Dense Graph Convolutional Network for Skeleton-based Human Motion Prediction

AMHGCN: Adaptive multi-level hypergraph convolution network for human motion prediction

An Attractor-Guided Neural Networks for Skeleton-Based Human Motion Prediction

Gradient multi-foci networks for 3D skeleton-based human motion prediction

Geometric algebra-based multiview interaction networks for 3D human motion prediction

Enhanced Spatial–temporal Dynamics in Pose Forecasting Through Multi-Graph Convolution Networks

MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction

ChebMixer: Efficient Graph Representation Learning with MLP Mixer

Multi-Graph Convolution Network for Pose Forecasting

Multitask Non-Autoregressive Model For Human Motion Prediction

Shap-Mix: Shapley Value Guided Mixing for Long-Tailed Skeleton Based Action Recognition