Abstract:3D human shape reconstruction under severe occlusion due to human-object or human-human interaction is a challenging problem. Parametric models i.e., SMPL(-X), which are based on the statistics across human shapes, can represent whole human body shapes but are limited to minimally-clothed human shapes. Implicit-function-based methods extract features from the parametric models to employ prior knowledge of human bodies and can capture geometric details such as clothing and hair. However, they often struggle to handle misaligned parametric models and inpaint occluded regions given a single RGB image. In this work, we propose a novel pipeline, MHCDIFF, Multi-hypotheses Conditioned Point Cloud Diffusion, composed of point cloud diffusion conditioned on probabilistic distributions for pixel-aligned detailed 3D human reconstruction under occlusion. Compared to previous implicit-function-based methods, the point cloud diffusion model can capture the global consistent features to generate the occluded regions, and the denoising process corrects the misaligned SMPL meshes. The core of MHCDIFF is extracting local features from multiple hypothesized SMPL(-X) meshes and aggregating the set of features to condition the diffusion model. In the experiments on CAPE and MultiHuman datasets, the proposed method outperforms various SOTA methods based on SMPL, implicit functions, point cloud diffusion, and their combined, under synthetic and real occlusions. Our code is publicly available at <a class="link-external link-https" href="https://donghwankim0101.github.io/projects/mhcdiff/" rel="external noopener nofollow">this https URL</a> .

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to reconstruct pixel - aligned and detailed 3D human shapes from a single RGB image under severe occlusion. Specifically, the paper focuses on how to effectively reconstruct 3D human shapes under severe occlusion conditions caused by interactions between humans and objects or between humans. Although traditional parametric models such as SMPL and its variants can represent the entire human body shape, they have limited effectiveness when dealing with human bodies wearing loose clothing. While methods based on implicit functions can capture geometric details, they are often difficult to handle misaligned parametric models and are difficult to complete occluded areas given a single RGB image. Therefore, the paper proposes a new method - the Multi - Hypothesis Conditional Point Cloud Diffusion Model (MHCD IFF), aiming to overcome these challenges and achieve more accurate and robust 3D human reconstruction. The main contributions of the paper include: 1. **Multi - Hypothesis Conditional Mechanism**: A novel multi - hypothesis conditional mechanism is introduced, which can effectively capture multiple possible SMPL mesh distributions. This method is robust to noise in each SMPL estimate. To the best of the authors' knowledge, this is the first time that multi - hypothesis SMPL estimates have been extended to pixel - aligned 3D human reconstruction. 2. **Point Cloud Diffusion Model**: A point cloud diffusion model is adopted to capture globally consistent features and complete invisible parts. Unlike previous implicit function methods, this model can correct misaligned SMPL estimates during the denoising process and generate detailed human meshes. 3. **Performance Improvement**: By training MHCD IFF on synthetic partial - body images, this method outperforms previous methods in dealing with both occluded and complete - body images. The paper verifies the effectiveness of MHCD IFF through experiments on the CAPE and MultiHuman datasets, especially showing superior performance under different occlusion ratios. In addition, the paper also conducts ablation studies to verify the effectiveness of each component, including the influence of multi - hypothesis conditions, local feature extraction, and training strategies.

Multi-hypotheses Conditioned Point Cloud Diffusion for 3D Human Reconstruction from Occluded Images

Di^2Pose: Discrete Diffusion Model for Occluded 3D Human Pose Estimation

Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution.

Generative Approach for Probabilistic Human Mesh Recovery using Diffusion Models

MHPro: Multi-hypothesis Probabilistic Modeling for Human Mesh Recovery

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery

MH‐HMR: Human mesh recovery from monocular images via multi‐hypothesis learning

A Conditional Diffusion Model for 3D Human Pose Estimation

PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion

3D Human Reconstruction from A Single Depth Image

DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans

D-SCo: Dual-Stream Conditional Diffusion for Monocular Hand-Held Object Reconstruction

DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion Probabilistic Model

PC2: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction

DiHuR: Diffusion-Guided Generalizable Human Reconstruction

CenterHMR: Multi-Person Center-based Human Mesh Recovery

PersonaCraft: Personalized Full-Body Image Synthesis for Multiple Identities from Single References Using 3D-Model-Conditioned Diffusion

VoteHMR: Occlusion-Aware Voting Network for Robust 3D Human Mesh Recovery from Partial Point Clouds

Distribution-Aligned Diffusion for Human Mesh Recovery

3d human pose estimation based on conditional dual-branch diffusion

HDPose: Post-Hierarchical Diffusion with Conditioning for 3D Human Pose Estimation