Graph-Guided MLP-Mixer for Skeleton-Based Human Motion Prediction

Xinshun Wang,Qiongjie Cui,Chen Chen,Shen Zhao,Mengyuan Liu
2023-08-07
Abstract:In recent years, Graph Convolutional Networks (GCNs) have been widely used in human motion prediction, but their performance remains unsatisfactory. Recently, MLP-Mixer, initially developed for vision tasks, has been leveraged into human motion prediction as a promising alternative to GCNs, which achieves both better performance and better efficiency than GCNs. Unlike GCNs, which can explicitly capture human skeleton's bone-joint structure by representing it as a graph with edges and nodes, MLP-Mixer relies on fully connected layers and thus cannot explicitly model such graph-like structure of human's. To break this limitation of MLP-Mixer's, we propose \textit{Graph-Guided Mixer}, a novel approach that equips the original MLP-Mixer architecture with the capability to model graph structure. By incorporating graph guidance, our \textit{Graph-Guided Mixer} can effectively capture and utilize the specific connectivity patterns within human skeleton's graph representation. In this paper, first we uncover a theoretical connection between MLP-Mixer and GCN that is unexplored in existing research. Building on this theoretical connection, next we present our proposed \textit{Graph-Guided Mixer}, explaining how the original MLP-Mixer architecture is reinvented to incorporate guidance from graph structure. Then we conduct an extensive evaluation on the Human3.6M, AMASS, and 3DPW datasets, which shows that our method achieves state-of-the-art performance.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in human motion prediction, although existing graph convolutional networks (GCNs) can explicitly capture the joint - bone structure of the human skeleton, their performance is still not satisfactory and there are challenges in dealing with long - term dependencies and temporal dynamics. Meanwhile, although the multi - layer perceptron mixer (MLP - Mixer) performs excellently in visual tasks and has shown better performance and efficiency than GCNs after being introduced into the field of human motion prediction, it cannot explicitly model the information of such graph structures. Therefore, the paper proposes a new method - Graph - Guided Mixer, aiming to combine the flexibility of MLP - Mixer with the graph - structure - capturing ability of GCNs to overcome the limitations of existing methods and improve the accuracy and efficiency of human motion prediction. Specifically, the paper first reveals the unexplored theoretical connection between MLP - Mixer and GCNs. This discovery makes it possible to inject graph - structure information into the MLP - Mixer architecture. Based on this theoretical connection, the paper proposes Graph - Guided Mixer, which enhances the MLP - Mixer's ability to effectively capture and utilize specific connection patterns in the human skeleton graph representation by using the graph structure as guidance. The paper also proves through extensive experiments on the Human3.6M, AMASS and 3DPW datasets that the proposed method achieves state - of - the - art performance.