KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation

Jihua Peng,Yanghong Zhou,P.Y. Mok
2024-04-02
Abstract:This paper presents a novel Kinematics and Trajectory Prior Knowledge-Enhanced Transformer (KTPFormer), which overcomes the weakness in existing transformer-based methods for 3D human pose estimation that the derivation of Q, K, V vectors in their self-attention mechanisms are all based on simple linear mapping. We propose two prior attention modules, namely Kinematics Prior Attention (KPA) and Trajectory Prior Attention (TPA) to take advantage of the known anatomical structure of the human body and motion trajectory information, to facilitate effective learning of global dependencies and features in the multi-head self-attention. KPA models kinematic relationships in the human body by constructing a topology of kinematics, while TPA builds a trajectory topology to learn the information of joint motion trajectory across frames. Yielding Q, K, V vectors with prior knowledge, the two modules enable KTPFormer to model both spatial and temporal correlations simultaneously. Extensive experiments on three benchmarks (Human3.6M, MPI-INF-3DHP and HumanEva) show that KTPFormer achieves superior performance in comparison to state-of-the-art methods. More importantly, our KPA and TPA modules have lightweight plug-and-play designs and can be integrated into various transformer-based networks (i.e., diffusion-based) to improve the performance with only a very small increase in the computational overhead. The code is available at:
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in 3D human pose estimation, the existing Transformer - based methods, when generating Q, K, V vectors in the self - attention mechanism, only rely on simple linear mappings, which leads to insufficient ability to effectively model the spatial relationships between joints and the motion trajectory information in time series. Specifically, when dealing with 3D human pose estimation, the existing methods have difficulty fully capturing the spatial correlation of the human anatomical structure and the temporal correlation of joint motion trajectories, thus affecting the performance of the model. To solve this problem, the paper proposes a novel Kinematics and Trajectory Prior Knowledge - Enhanced Transformer (KTPFormer). By introducing two prior attention modules - Kinematics Prior Attention (KPA) and Trajectory Prior Attention (TPA), it utilizes the known information of human anatomical structure and motion trajectories to enhance the global dependence and feature learning ability of Transformer in the multi - head self - attention mechanism. KPA models the kinematic relationships between joints by constructing a human kinematic topology, while TPA learns the information of joint motion trajectories between frames by constructing a trajectory topology. These two modules enable KTPFormer to model spatial and temporal correlations simultaneously, thus achieving better performance than the existing state - of - the - art methods on multiple benchmark datasets. In addition, the KPA and TPA modules are designed to be lightweight and easy to integrate, and can be seamlessly embedded into various Transformer - based networks, significantly improving performance with only a very small increase in computational overhead.