Class-guided Human Motion Prediction Via Multi-Spatial-temporal Supervision

Jinkai Li,Honghu Pan,Lian Wu,Chao Huang,Xiaoling Luo,Yong Xu
DOI: https://doi.org/10.1007/s00521-023-08362-x
2023-01-01
Neural Computing and Applications
Abstract:As an important and challenging task in computer vision, human motion prediction aims to predict the future human motion sequence from a given historical sequence. Though the existing works can perform well with a well-designed network, they fail to exploit the semantic information within the input sequences. Inspired by the observation that the human motion sequence strongly correlates with the semantic class, we propose a class-guided network to predict future human poses. Specifically, the semantic class of the historical motion sequence is integrated as an elaborate class-guided loss function, which guides the network to predict the semantic-specific poses. Furthermore, we devise two extra spatial-temporal supervision signals to improve the stability and smoothness of the predicted motion sequence: the spatial multi-scale loss can promote the stability by minimizing the difference between the predictions and the groundtruth at multiple scales; and the multi-temporal loss can enhance the smoothness by narrowing the kinetics difference of human motion sequences. The experimental results on two benchmark datasets ( i.e., Human3.6M and CMU Mocap) demonstrate that the proposed supervisions can effectively improve the prediction accuracy, and our method leads to a new state-of-the-art performance. Our code is available at https://github.com/cobblestones/CGHMP .
What problem does this paper attempt to address?