Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation

Hanbing Liu,Wangmeng Xiang,Jun-Yan He,Zhi-Qi Cheng,Bin Luo,Yifeng Geng,Xuansong Xie
DOI: https://doi.org/10.48550/arxiv.2309.01365
2023-01-01
Abstract:Accurately estimating the 3D pose of humans in video sequences requires bothaccuracy and a well-structured architecture. With the success of transformers,we introduce the Refined Temporal Pyramidal Compression-and-Amplification(RTPCA) transformer. Exploiting the temporal dimension, RTPCA extendsintra-block temporal modeling via its Temporal PyramidalCompression-and-Amplification (TPCA) structure and refines inter-block featureinteraction with a Cross-Layer Refinement (XLR) module. In particular, TPCAblock exploits a temporal pyramid paradigm, reinforcing key and valuerepresentation capabilities and seamlessly extracting spatial semantics frommotion sequences. We stitch these TPCA blocks with XLR that promotes richsemantic representation through continuous interaction of queries, keys, andvalues. This strategy embodies early-stage information with current flows,addressing typical deficits in detail and stability seen in othertransformer-based methods. We demonstrate the effectiveness of RTPCA byachieving state-of-the-art results on Human3.6M, HumanEva-I, and MPI-INF-3DHPbenchmarks with minimal computational overhead. The source code is available athttps://github.com/hbing-l/RTPCA.
What problem does this paper attempt to address?