Abstract:To combine the advantages of deterministic and probabilistic 3D human pose estimation methods, we decompose pose estimation into two processes: hypotheses generation and hypotheses aggregation. For hypotheses generation, we propose a novel Diffusion-based 3D Pose generation (D3DP) method. D3DP generates a diversified group of plausible 3D pose hypotheses from a single 2D keypoint observation. Utilizing a diffusion process, it gradually transforms ground-truth 3D poses towards a random distribution, subsequently employing a conditioned denoiser guided by the observed keypoints to recover the uncorrupted 3D poses. Moreover, D3DP is compatible with existing deterministic 3D pose estimators and allows users to optimize the trade-off between computational efficiency and pose accuracy via two adjustable parameters. For hypotheses aggregation, we propose two alternative approaches: a Reprojection-Based Selection (RBS) method and a Hypotheses Selection Network (HSN). These methods adopt the joint-level strategy to assemble multiple hypotheses generated by D3DP into a single 3D pose for practical use. Specifically, RBS reprojects 3D pose hypotheses to the 2D camera plane, and selects the best hypothesis based on the reprojection errors. HSN evaluates each hypothesis and selects the hypothesis with the highest confidence score as the output. Then these selected joints are combined into the final pose. The proposed methods implement a joint-by-joint aggregation strategy that capitalizes on the 2D prior and temporal information, both of which have been ignored by previous pose-level methods. Extensive experiments on two benchmarks highlight that the proposed method outperforms the state-of-the-art deterministic and probabilistic approaches.

Joint2Human: High-quality 3D Human Generation Via Compact Spherical Embedding of 3D Joints

Generation and Evaluation of Unimaginable Three-Dimensional Structural Joints Using Generative Adversarial Networks

Generating 3D Virtual Human Animation Based on Facial Expression and Human Posture Captured by Dual Cameras

HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation

HumanGen: Generating Human Radiance Fields with Explicit Priors

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Diffusion-Based Hypotheses Generation and Joint-Level Hypotheses Aggregation for 3D Human Pose Estimation

HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion

SPGformer: Serial–Parallel Hybrid GCN-Transformer With Graph-Oriented Encoder for 2-D-to-3-D Human Pose Estimation

HumanLiff: Layer-wise 3D Human Generation with Diffusion Model

En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data

Human-Aware 3D Scene Generation with Spatially-constrained Diffusion Models

MVHuman: Tailoring 2D Diffusion with Multi-view Sampling For Realistic 3D Human Generation

InterFusion: Text-Driven Generation of 3D Human-Object Interaction

A novel joint points and silhouette-based method to estimate 3D human pose and shape

Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models

HumanCoser: Layered 3D Human Generation via Semantic-Aware Diffusion Model

RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation

Diverse 3D Human Pose Generation in Scenes based on Decoupled Structure

3D joints estimation of human body using part segmentation