Spatial-Temporal Graph U-Net for Skeleton-Based Human Motion Infilling

Leiyang Xu,Qiang Wang,Chenguang Yang
DOI: https://doi.org/10.1109/icit58233.2024.10540720
2024-01-01
Abstract:Motion infilling is a fundamental and challenging research field in human motion modeling and analysis, which aims to generate natural and visually coherent transitions to fill in missing motion frames based on the start and end motion sequences. However, most current methods ignore the spatial structure formed by joints, which may lose some spatial information. This work proposes a novel spatiotemporal graph U-Net that supports flexible inputs for skeleton-based motion infilling. We apply spatiotemporal graph convolutional layers, skeleton pooling layers, and skeleton unpooling layers to extract spatial and temporal features in the motion sequence. At the same time, we use the U-Net structure to integrate the information in the start and end motion sequences. In addition, the generative adversarial mechanism is introduced to ensure the generated skeleton poses are smooth and natural. We conduct experiments on two motion datasets, including one large-scale public dataset and one self-built dataset. The model inputs are joint quaternions or joint coordinates. Experimental results show that our method can improve the performance of skeleton-based motion infilling and achieve state-of-the-art results when using joint coordinates as model input.
What problem does this paper attempt to address?