Spatiotemporal Consistency Learning from Momentum Cues for Human Motion Prediction

Haipeng Chen,Jiahui Hu,Wenyin Zhang,Pengxiang Su
DOI: https://doi.org/10.1109/tcsvt.2023.3284013
IF: 5.859
2023-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Extrapolating future human motion based on the historical human pose sequence is the foundation of various intelligent applications. Numerous deep learning-based algorithms have been designed to address this task, achieving state-of-the-art performance on different human motion benchmark datasets. However, most existing methods employ three-dimensional coordinates of joints to demonstrate dynamic motion contexts implicitly. Unfortunately, it remains challenging in capturing motion information from the pose sequence. In this paper, we advocate explicitly describing dynamic contexts via the momentum of human motion mechanic space, as the momentum of a joint is explicit, temporal consistent, and can provide abundant information to the model. In addition, the single-stream methods play a dominant role in the field of human motion prediction. They usually capture motion information via the strategy of continuous or sparse sampling, which might obviate global or detailed local information. Therefore, we present a simple yet effective dual-stream method that can consider both the detailed and global temporal information through a combination of continuous and sparse sampling. The proposed dual-stream paradigm enables the improvement of computational efficiency and the short-term prediction accuracy concurrently. Furthermore, we present a novel temporal attention-based graph convolutional network (TA-GCN) to derive a spatiotemporally consistent motion representation, which can adequately consider the rationality of human body topology. Extensive experiments on two large motion prediction benchmark datasets (i.e., Human 3.6M and CMU Mocap) show that our algorithm achieves state-of-the-art performance both qualitatively and quantitatively.
What problem does this paper attempt to address?