PVA-GCN: point-voxel absorbing graph convolutional network for 3D human pose estimation from monocular video
Minghao Liu,Wenshan Wang,Wei Zhao
DOI: https://doi.org/10.1007/s11760-024-03028-0
IF: 1.583
2024-02-17
Signal Image and Video Processing
Abstract:The evolution of 3D human pose estimation techniques has seen substantial progress over the past few decades, with notable advancements in accuracy and applications. While recent research primarily aims at enhancing estimated pose performance, it is important to acknowledge the challenges encountered when evaluating these estimations against ground truth pose data. Our findings emphasize the pivotal role of refining 2D pose data or integrating advanced 2D pose detectors in elevating the quality of estimated pose data. For instance, refining the accuracy of 2D pose data positively correlates with the precision of the final estimated 3D pose. To streamline computational complexity, techniques like OctreeGrid filtering and VoxelGraph construction are employed. OctreeGrid filtering involves organizing data in a hierarchical octree structure, facilitating the extraction of essential joint points and voxel representations. VoxelGraphs focus on capturing spatiotemporal relationships within point clouds and voxels, enhancing the model's understanding of 3D spatial configurations. Our model, PVA-GCN, underwent extensive evaluation on benchmark datasets including Human3.6M, HumanEva-I, and MPI-INF-3DHP, surpassing existing state-of-the-art methods. These validations indicate the model's robustness across diverse datasets and scenarios, contributing significantly to advancing 3D human pose estimation. This research significantly contributes to the advancement of 3D human pose estimation by leveraging ground truth data to enhance pose estimation quality, thereby laying a foundation for future developments in the field.
engineering, electrical & electronic,imaging science & photographic technology