An Efficient Graph Transformer Network for Video-Based Human Mesh Reconstruction.

Tao Tang,Yingxuan You,Ti Wang,Hong Liu
DOI: https://doi.org/10.1007/978-981-99-8850-1_17
2024-01-01
Abstract:Although existing image-based methods for 3D human mesh reconstruction have achieved remarkable accuracy, effectively capturing smooth human motion from monocular video remains a significant challenge. Recently, video-based methods for human mesh reconstruction tend to build more complex networks to capture temporal information of human motion, resulting in a large number of parameters and limiting their practical applications. To address this issue, we propose an Efficient Graph Transformer network to Reconstruct 3D human mesh from monocular video, named EGTR. Specifically, we present a temporal redundancy removal module that uses 1D convolution to eliminate redundant information among video frames and a spatial-temporal fusion module that combines Modulated GCN with transformer framework to capture human motion. Our method achieves better accuracy than the state-of-the-art video-based method TCMR on 3DPW, Human3.6M and MPI-INF-3DHP datasets while only using 8.7% of the parameters, indicating the effectiveness of our method for practical applications.
What problem does this paper attempt to address?