Diffusion Based Coarse-to-Fine Network for 3D Human Pose and Shape Estimation from Monocular Video

Chuqiao Wu,Haitao Huang,Wenming Yang
DOI: https://doi.org/10.1109/icme57554.2024.10687919
2024-01-01
Abstract:Video-based 3D human pose and shape estimation plays a crucial role in enhancing human understanding. However, existing methods typically employ a unified model for both pose and shape parameter estimation, neglecting the inherent uncertainty introduced by factors such as blurring and occlusion during pose estimation. This oversight can lead to suboptimal solutions, especially in challenging scenarios. To tackle this issue, we propose a Coarse-to-Fine Diffusion-based Refinement Network (DR-Net). The initial regressor undergoes pre-training on large datasets to comprehend human motion dynamics. In the refinement framework, a diffusion-based refinement regressor is introduced, utilizing reverse denoising to incrementally refine pose parameters. In order to capture the kinematics of human motion and model parameter-feature relationships, we design the GCN-ATT module as a denoiser within the diffusion-based regressor. Extensive experiments demonstrate its superiority over state-of-the-art methods on benchmark datasets Human3.6M [1] and 3DPW [2].
What problem does this paper attempt to address?