Cloth2Body: Generating 3D Human Body Mesh from 2D Clothing

Lu Dai,Liqian Ma,Shenhan Qian,Hao Liu,Ziwei Liu,Hui Xiong
DOI: https://doi.org/10.48550/arXiv.2309.16189
2023-09-28
Abstract:In this paper, we define and study a new Cloth2Body problem which has a goal of generating 3D human body meshes from a 2D clothing image. Unlike the existing human mesh recovery problem, Cloth2Body needs to address new and emerging challenges raised by the partial observation of the input and the high diversity of the output. Indeed, there are three specific challenges. First, how to locate and pose human bodies into the clothes. Second, how to effectively estimate body shapes out of various clothing types. Finally, how to generate diverse and plausible results from a 2D clothing image. To this end, we propose an end-to-end framework that can accurately estimate 3D body mesh parameterized by pose and shape from a 2D clothing image. Along this line, we first utilize Kinematics-aware Pose Estimation to estimate body pose parameters. 3D skeleton is employed as a proxy followed by an inverse kinematics module to boost the estimation accuracy. We additionally design an adaptive depth trick to align the re-projected 3D mesh better with 2D clothing image by disentangling the effects of object size and camera extrinsic. Next, we propose Physics-informed Shape Estimation to estimate body shape parameters. 3D shape parameters are predicted based on partial body measurements estimated from RGB image, which not only improves pixel-wise human-cloth alignment, but also enables flexible user editing. Finally, we design Evolution-based pose generation method, a skeleton transplanting method inspired by genetic algorithms to generate diverse reasonable poses during inference. As shown by experimental results on both synthetic and real-world data, the proposed framework achieves state-of-the-art performance and can effectively recover natural and diverse 3D body meshes from 2D images that align well with clothing.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to generate 3D human body mesh models from 2D clothing images. Specifically, the Cloth2Body task aims to generate 3D human body models that can be pixel - level aligned with the clothing from a 2D clothing image, and these models can adapt to different postures and body types. This task faces three main challenges: 1. **Partial Observation**: There are missing human pixels in 2D clothing images, so it is necessary to infer the 3D human body mesh according to the interaction between clothing and the human body. 2. **Pixel - level Alignment**: When the 3D human body model is re - projected onto the 2D plane, it needs to be aligned with the 2D clothing image at the pixel level. 3. **Diverse Outputs**: The same 2D clothing image may be suitable for 3D human body models with a variety of different postures and body types, so it is necessary to model this output diversity. To solve these problems, the authors propose an end - to - end framework, which consists of the following three main components: 1. **Kinematics - aware Pose Estimation**: - Use a neural network to estimate the 3D joint positions \( \mathbf{x} \) and bone torsion angles \( \phi \) from 2D clothing images. - Calculate the rotation matrix \( \theta \) of each joint through inverse kinematics (IK). - Introduce the adaptive depth estimation technique, adjust the camera depth \( z_{\text{cam}} \) through the bone length ratio to improve the alignment effect between the 3D human body model and the 2D clothing image. 2. **Physics - informed Shape Estimation**: - Estimate the clothing key points from the input 2D clothing image, and combine the body joints in the pose estimation module to calculate the axial and radial body measurements. - Use these measurements to estimate the shape parameter \( \beta \) of the SMPL model, thereby improving the accuracy and interpretability of shape estimation. 3. **Evolution - based Pose Generation**: - Utilize the crossover and mutation operations in the genetic algorithm to generate diverse and reasonable postures. - Generate diverse postures that meet the conditions through K - nearest neighbor matching (KNN Matching) and skeleton transplanting methods. The experimental results show that this framework has achieved state - of - the - art performance on both synthetic data sets and real - world data sets, and can effectively recover natural and diverse 3D human body models from 2D clothing images, and these models are well - aligned with the clothing images at the pixel level.