MFOGCN: Multi-Feature-based Orthogonal Graph Convolutional Network for 3D Human Motion Prediction
Jianfeng Tu,Tuo Zang,Mengran Duan,Hanrui Jiang,Jiahui Zhao,Nan Jiang,Lingfeng Liu
DOI: https://doi.org/10.1007/s00371-023-03152-x
2024-01-01
Abstract:Human motion prediction in various motion capture applications, e.g., optical and inertial, is challenging because of the complexity of human motion sequences. Current studies on this issue have insufficient analysis on the latent motion information in a given motion sequence, such as motion trends, transient changes, and temporal evolution. Meanwhile, methods using simple graph convolution networks suffer from over-smoothing, causing the predicted poses staying invariant in long-term prediction. To address these challenges, we propose a multi-feature-based orthogonal graph convolution network (MFOGCN), where the multi-feature extraction consists of two key modules: (1) hybrid spectral transform, which captures local transient features and global motion trends of motion sequences by discrete wavelet transform while considering temporal smoothing between human joints and (2) mask-aware multiple attention, with sliding time windows to extract motion sequence feature representations from historical multiple subsequences, refining the correlation between adjacent poses while obtaining global dependencies between sequences. In addition, we propose orthogonal graph convolution and orthogonal loss for the prediction network, which help to stabilize the feature transformation of the graph convolution to resolve the over-smoothing issue. An extensive evaluation on the Human 3.6M, AMASS and 3DPW datasets has been conducted, showing reliable effectiveness of the proposed MFOGCN that outperforms other approaches.