Abstract:Recently, diffusion-based methods for monocular 3D human pose estimation have achieved state-of-the-art (SOTA) performance by directly regressing the 3D joint coordinates from the 2D pose sequence. Although some methods decompose the task into bone length and bone direction prediction based on the human anatomical skeleton to explicitly incorporate more human body prior constraints, the performance of these methods is significantly lower than that of the SOTA diffusion-based methods. This can be attributed to the tree structure of the human skeleton. Direct application of the disentangled method could amplify the accumulation of hierarchical errors, propagating through each hierarchy. Meanwhile, the hierarchical information has not been fully explored by the previous methods. To address these problems, a Disentangled Diffusion-based 3D Human Pose Estimation method with Hierarchical Spatial and Temporal Denoiser is proposed, termed DDHPose. In our approach: (1) We disentangle the 3D pose and diffuse the bone length and bone direction during the forward process of the diffusion model to effectively model the human pose prior. A disentanglement loss is proposed to supervise diffusion model learning. (2) For the reverse process, we propose Hierarchical Spatial and Temporal Denoiser (HSTDenoiser) to improve the hierarchical modeling of each joint. Our HSTDenoiser comprises two components: the Hierarchical-Related Spatial Transformer (HRST) and the Hierarchical-Related Temporal Transformer (HRTT). HRST exploits joint spatial information and the influence of the parent joint on each joint for spatial modeling, while HRTT utilizes information from both the joint and its hierarchical adjacent joints to explore the hierarchical temporal correlations among joints.

Learning Structure-Guided Diffusion Model for 2D Human Pose Estimation

Di^2Pose: Discrete Diffusion Model for Occluded 3D Human Pose Estimation

DiffPose: Reliable 2D Pose Estimation Through Denoising Diffusion

A Conditional Diffusion Model for 3D Human Pose Estimation

DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion Probabilistic Model

STN-enhanced Message Passing Guided by Adversarial Learning for Human Pose Estimation

Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser

3D Human Pose Estimation Based on Conditional Dual-Branch Diffusion Model

3d human pose estimation based on conditional dual-branch diffusion

Diffusion-Based Pose Refinement and Multi-Hypothesis Generation for 3D Human Pose Estimation

Diffusion-Based Hypotheses Generation and Joint-Level Hypotheses Aggregation for 3D Human Pose Estimation

Diffusion Based Coarse-to-Fine Network for 3D Human Pose and Shape Estimation from Monocular Video

Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D Human Pose Estimaiton

D3PRefiner: A Diffusion-based Denoise Method for 3D Human Pose Refinement

HDPose: Post-Hierarchical Diffusion with Conditioning for 3D Human Pose Estimation

DiffPose: Toward More Reliable 3D Pose Estimation

Research Progress of Deep Learning Methods in Two-dimensional Human Pose Estimation

3D Human Pose Analysis via Diffusion Synthesis

3D Human Pose Estimation Via Human Structure-Aware Fully Connected Network

Is 2D Heatmap Representation Even Necessary for Human Pose Estimation?

Learning Positional Priors for Pretraining 2D Pose Estimators.