Multi-hypotheses Conditioned Point Cloud Diffusion for 3D Human Reconstruction from Occluded Images

Donghwan Kim,Tae-Kyun Kim
2024-10-29
Abstract:3D human shape reconstruction under severe occlusion due to human-object or human-human interaction is a challenging problem. Parametric models i.e., SMPL(-X), which are based on the statistics across human shapes, can represent whole human body shapes but are limited to minimally-clothed human shapes. Implicit-function-based methods extract features from the parametric models to employ prior knowledge of human bodies and can capture geometric details such as clothing and hair. However, they often struggle to handle misaligned parametric models and inpaint occluded regions given a single RGB image. In this work, we propose a novel pipeline, MHCDIFF, Multi-hypotheses Conditioned Point Cloud Diffusion, composed of point cloud diffusion conditioned on probabilistic distributions for pixel-aligned detailed 3D human reconstruction under occlusion. Compared to previous implicit-function-based methods, the point cloud diffusion model can capture the global consistent features to generate the occluded regions, and the denoising process corrects the misaligned SMPL meshes. The core of MHCDIFF is extracting local features from multiple hypothesized SMPL(-X) meshes and aggregating the set of features to condition the diffusion model. In the experiments on CAPE and MultiHuman datasets, the proposed method outperforms various SOTA methods based on SMPL, implicit functions, point cloud diffusion, and their combined, under synthetic and real occlusions. Our code is publicly available at <a class="link-external link-https" href="https://donghwankim0101.github.io/projects/mhcdiff/" rel="external noopener nofollow">this https URL</a> .
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to reconstruct pixel - aligned and detailed 3D human shapes from a single RGB image under severe occlusion. Specifically, the paper focuses on how to effectively reconstruct 3D human shapes under severe occlusion conditions caused by interactions between humans and objects or between humans. Although traditional parametric models such as SMPL and its variants can represent the entire human body shape, they have limited effectiveness when dealing with human bodies wearing loose clothing. While methods based on implicit functions can capture geometric details, they are often difficult to handle misaligned parametric models and are difficult to complete occluded areas given a single RGB image. Therefore, the paper proposes a new method - the Multi - Hypothesis Conditional Point Cloud Diffusion Model (MHCD IFF), aiming to overcome these challenges and achieve more accurate and robust 3D human reconstruction. The main contributions of the paper include: 1. **Multi - Hypothesis Conditional Mechanism**: A novel multi - hypothesis conditional mechanism is introduced, which can effectively capture multiple possible SMPL mesh distributions. This method is robust to noise in each SMPL estimate. To the best of the authors' knowledge, this is the first time that multi - hypothesis SMPL estimates have been extended to pixel - aligned 3D human reconstruction. 2. **Point Cloud Diffusion Model**: A point cloud diffusion model is adopted to capture globally consistent features and complete invisible parts. Unlike previous implicit function methods, this model can correct misaligned SMPL estimates during the denoising process and generate detailed human meshes. 3. **Performance Improvement**: By training MHCD IFF on synthetic partial - body images, this method outperforms previous methods in dealing with both occluded and complete - body images. The paper verifies the effectiveness of MHCD IFF through experiments on the CAPE and MultiHuman datasets, especially showing superior performance under different occlusion ratios. In addition, the paper also conducts ablation studies to verify the effectiveness of each component, including the influence of multi - hypothesis conditions, local feature extraction, and training strategies.