Abstract:The recovery of occluded human meshes presents challenges for current methods due to the difficulty in extracting effective image features under severe occlusion. In this paper, we introduce DPMesh, an innovative framework for occluded human mesh recovery that capitalizes on the profound diffusion prior about object structure and spatial relationships embedded in a pre-trained text-to-image diffusion model. Unlike previous methods reliant on conventional backbones for vanilla feature extraction, DPMesh seamlessly integrates the pre-trained denoising U-Net with potent knowledge as its image backbone and performs a single-step inference to provide occlusion-aware information. To enhance the perception capability for occluded poses, DPMesh incorporates well-designed guidance via condition injection, which produces effective controls from 2D observations for the denoising U-Net. Furthermore, we explore a dedicated noisy key-point reasoning approach to mitigate disturbances arising from occlusion and crowded scenarios. This strategy fully unleashes the perceptual capability of the diffusion prior, thereby enhancing accuracy. Extensive experiments affirm the efficacy of our framework, as we outperform state-of-the-art methods on both occlusion-specific and standard datasets. The persuasive results underscore its ability to achieve precise and robust 3D human mesh recovery, particularly in challenging scenarios involving occlusion and crowded scenes.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to tackle the challenge of recovering human 3D meshes from monocular images in complex scenarios such as occlusion and crowded environments. Specifically, existing methods rely on traditional 2D alignment techniques, which perform poorly under severe occlusion. To overcome these limitations, the paper proposes the DPMesh framework, which leverages a pre-trained diffusion model to extract strong prior knowledge about object structure and spatial relationships, enabling more accurate and robust human mesh recovery. ### Main Contributions 1. **Innovative Framework**: DPMesh utilizes a pre-trained text-to-image diffusion model as the backbone network, fully exploiting its understanding of 3D structures and spatial relationships to provide robust estimation results under occlusion. 2. **Conditional Injection**: By designing an effective conditional injection mechanism, the pre-detected 2D keypoint information is integrated into the diffusion model, enhancing the model's perception of occluded areas. 3. **Noisy Keypoint Inference**: A noisy keypoint inference method is proposed to improve the model's robustness under noisy 2D observations, ensuring stability in complex scenarios. ### Experimental Results Experiments show that DPMesh achieves significantly better performance than existing methods on multiple occlusion benchmark datasets (such as 3DPW-OC, 3DPW-PC, 3DOH, etc.), particularly excelling in handling severe occlusion and crowded scenes. Specific metrics are as follows: - On the 3DPW-OC dataset, DPMesh achieved an MPJPE of 70.9 mm and a PA-MPJPE of 48.0 mm. - On the 3DOH dataset, DPMesh achieved an MPJPE of 97.1 mm and a PA-MPJPE of 59.0 mm. These results demonstrate the effectiveness and robustness of DPMesh in handling complex occlusion scenarios.

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery

Distribution-Aligned Diffusion for Human Mesh Recovery

Multi-hypotheses Conditioned Point Cloud Diffusion for 3D Human Reconstruction from Occluded Images

Di^2Pose: Discrete Diffusion Model for Occluded 3D Human Pose Estimation

Multiview Textured Mesh Recovery by Differentiable Rendering

DiffMesh: A Motion-aware Diffusion Framework for Human Mesh Recovery from Videos

DPoser: Diffusion Model as Robust 3D Human Pose Prior

3DPMesh: An enhanced and novel approach for the reconstruction of 3D human meshes from a single 2D image

CenterHMR: Multi-Person Center-based Human Mesh Recovery

Learning Dense UV Completion for Human Mesh Recovery

O$^2$-Recon: Completing 3D Reconstruction of Occluded Objects in the Scene with a Pre-trained 2D Diffusion Model

Generative Approach for Probabilistic Human Mesh Recovery using Diffusion Models

OccFusion: Rendering Occluded Humans with Generative Diffusion Priors

Pose2UV: Single-Shot Multiperson Mesh Recovery with Deep UV Prior.

A Progressive Quadric Graph Convolutional Network for 3D Human Mesh Recovery

VoteHMR: Occlusion-Aware Voting Network for Robust 3D Human Mesh Recovery from Partial Point Clouds

Visibility-Aware Human Mesh Recovery Via Balancing Dense Correspondence and Probability Model

PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion

Human Mesh Recovery from Monocular Images via a Skeleton-disentangled Representation

MHPro: Multi-hypothesis Probabilistic Modeling for Human Mesh Recovery

Human De-occlusion: Invisible Perception and Recovery for Humans