Multi-Scale Spatio-Temporal Aggregation Network for Human Motion Prediction.

Haoyu Su,Shenglan Liu,Zewen Gao,Yifeng Dong,Junshi Yang,Suhao Ding
DOI: https://doi.org/10.1109/ISKE60036.2023.10481288
2023-01-01
Abstract:Human motion prediction is a fundamental problem in computer vision, aimed at predicting future motion sequence from historical motion sequence. Some recent works have shown that Graph Convolutional Networks(GCNs) perform well in modeling the correlation between human joints, and Temporal Convolutional Networks(TCNs) have been widely recognized for solving sequence problems. However, the locality of convolution operations makes it difficult to model the distant joints relations and long-term temporal information. To solve this problem, we propose a Multi-Scale Spatio-Temporal Graph Convolution(MST-GC) module and a Multi-Scale Temporal Convolution(M- Tc)module, which decompose the local convolution into a set of sub-convolutions that allow each joint to establish connections with distant nodes in both spatial and temporal dimensions. This enlarges the receptive field of the model, better capturing the spatio-temporal dependencies of human motion sequences. By combining these two modules, we further propose a novel Multi-Scale Spatio-Temporal Aggregation Network (MSTAN). Extensive experiments are conducted to show that the proposed MSTAN outperforms state-of-the-art methods in both shortand long-term motion prediction on the datasets of Human3.6M and AMASS.
What problem does this paper attempt to address?