Abstract:Compared to traditional dance, intangible cultural heritage dance often involves the isotropic extension of choreographic actions, utilizing both upper and lower limbs. This characteristic choreography style makes the remote joints lack interaction, consequently reducing accuracy in existing human motion prediction methods. Therefore, we propose a human motion prediction method based on the multi-scale hypergraph convolutional network of the intangible cultural heritage dance video. Firstly, this method inputs the 3D human posture sequence from intangible cultural heritage dance videos. The hypergraph is designed according to the synergistic relationship of the human joints in the intangible cultural heritage dance video, which is used to represent the spatial correlation of the 3D human posture. Then, a multi-scale hypergraph convolutional network is constructed, utilizing multi-scale transformation operators to segment the human skeleton into different scales. This network adopts a graph structure to represent the 3D human posture at different scales, which is then used by the single-scalar fusion operator to spatial features in the 3D human posture sequence are extracted by fusing the feature information of the hypergraph and the multi-scale graph. Finally, the Temporal Graph Transformer network is introduced to capture the temporal dependence among adjacent frames within the time domain. This facilitates the extraction of temporal features from the 3D human posture sequence, ultimately enabling the prediction of future 3D human posture sequences. Experiments show that we achieve the best performance in both short-term and long-term human motion prediction when compared to Motion-Mixer and Motion-Attention algorithms on Human3.6M and 3DPW datasets. In addition, ablation experiments show that our method can predict more precise 3D human pose sequences, even in the presence of isotropic extensions of upper and lower limbs in intangible cultural heritage dance videos. This approach effectively addresses the issue of missing segments in intangible cultural heritage dance videos.

Spatial-Temporal Graph U-Net for Skeleton-Based Human Motion Infilling

Learning a Deep Motion Interpolation Network for Human Skeleton Animations

Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video

Spatiotemporal Co-attention Recurrent Neural Networks for Human-Skeleton Motion Prediction

Adaptive Spatial-Temporal Graph-Mixer for Human Motion Prediction

Spatiotemporal Progressive Inward-Outward Aggregation Network for skeleton-based action recognition

Enhancing human behavior recognition with spatiotemporal graph convolutional neural networks and skeleton sequences

Multiscale Spatio-Temporal Graph Neural Networks for 3D Skeleton-Based Motion Prediction

Skeleton-Parted Graph Scattering Networks for 3D Human Motion Prediction

MFGCN: an efficient graph convolutional network based on multi-order feature information for human skeleton action recognition

An Attractor-Guided Neural Networks for Skeleton-Based Human Motion Prediction

Learning Snippet-to-Motion Progression for Skeleton-based Human Motion Prediction

Convolutional Autoencoders for Human Motion Infilling

Graph U-Shaped Network with Mapping-Aware Local Enhancement for Single-Frame 3D Human Pose Estimation

Conditional Directed Graph Convolution for 3D Human Pose Estimation

Dynamic Dense Graph Convolutional Network for Skeleton-based Human Motion Prediction

TrajectoryCNN: A New Spatio-Temporal Feature Learning Network for Human Motion Prediction

Priori separation graph convolution with long-short term temporal modeling for skeleton-based action recognition

Human Motion Prediction Based on a Multi-Scale Hypergraph for Intangible Cultural Heritage Dance Videos

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

Graph-Guided MLP-Mixer for Skeleton-Based Human Motion Prediction