DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery

Yixuan Zhu,Ao Li,Yansong Tang,Wenliang Zhao,Jie Zhou,Jiwen Lu
2024-04-02
Abstract:The recovery of occluded human meshes presents challenges for current methods due to the difficulty in extracting effective image features under severe occlusion. In this paper, we introduce DPMesh, an innovative framework for occluded human mesh recovery that capitalizes on the profound diffusion prior about object structure and spatial relationships embedded in a pre-trained text-to-image diffusion model. Unlike previous methods reliant on conventional backbones for vanilla feature extraction, DPMesh seamlessly integrates the pre-trained denoising U-Net with potent knowledge as its image backbone and performs a single-step inference to provide occlusion-aware information. To enhance the perception capability for occluded poses, DPMesh incorporates well-designed guidance via condition injection, which produces effective controls from 2D observations for the denoising U-Net. Furthermore, we explore a dedicated noisy key-point reasoning approach to mitigate disturbances arising from occlusion and crowded scenarios. This strategy fully unleashes the perceptual capability of the diffusion prior, thereby enhancing accuracy. Extensive experiments affirm the efficacy of our framework, as we outperform state-of-the-art methods on both occlusion-specific and standard datasets. The persuasive results underscore its ability to achieve precise and robust 3D human mesh recovery, particularly in challenging scenarios involving occlusion and crowded scenes.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to tackle the challenge of recovering human 3D meshes from monocular images in complex scenarios such as occlusion and crowded environments. Specifically, existing methods rely on traditional 2D alignment techniques, which perform poorly under severe occlusion. To overcome these limitations, the paper proposes the DPMesh framework, which leverages a pre-trained diffusion model to extract strong prior knowledge about object structure and spatial relationships, enabling more accurate and robust human mesh recovery. ### Main Contributions 1. **Innovative Framework**: DPMesh utilizes a pre-trained text-to-image diffusion model as the backbone network, fully exploiting its understanding of 3D structures and spatial relationships to provide robust estimation results under occlusion. 2. **Conditional Injection**: By designing an effective conditional injection mechanism, the pre-detected 2D keypoint information is integrated into the diffusion model, enhancing the model's perception of occluded areas. 3. **Noisy Keypoint Inference**: A noisy keypoint inference method is proposed to improve the model's robustness under noisy 2D observations, ensuring stability in complex scenarios. ### Experimental Results Experiments show that DPMesh achieves significantly better performance than existing methods on multiple occlusion benchmark datasets (such as 3DPW-OC, 3DPW-PC, 3DOH, etc.), particularly excelling in handling severe occlusion and crowded scenes. Specific metrics are as follows: - On the 3DPW-OC dataset, DPMesh achieved an MPJPE of 70.9 mm and a PA-MPJPE of 48.0 mm. - On the 3DOH dataset, DPMesh achieved an MPJPE of 97.1 mm and a PA-MPJPE of 59.0 mm. These results demonstrate the effectiveness and robustness of DPMesh in handling complex occlusion scenarios.