3D Clothed Human Reconstruction in the Wild

Gyeongsik Moon,Hyeongjin Nam,Takaaki Shiratori,Kyoung Mu Lee
DOI: https://doi.org/10.48550/arXiv.2207.10053
2022-07-21
Abstract:Although much progress has been made in 3D clothed human reconstruction, most of the existing methods fail to produce robust results from in-the-wild images, which contain diverse human poses and appearances. This is mainly due to the large domain gap between training datasets and in-the-wild datasets. The training datasets are usually synthetic ones, which contain rendered images from GT 3D scans. However, such datasets contain simple human poses and less natural image appearances compared to those of real in-the-wild datasets, which makes generalization of it to in-the-wild images extremely challenging. To resolve this issue, in this work, we propose ClothWild, a 3D clothed human reconstruction framework that firstly addresses the robustness on in-thewild images. First, for the robustness to the domain gap, we propose a weakly supervised pipeline that is trainable with 2D supervision targets of in-the-wild datasets. Second, we design a DensePose-based loss function to reduce ambiguities of the weak supervision. Extensive empirical tests on several public in-the-wild datasets demonstrate that our proposed ClothWild produces much more accurate and robust results than the state-of-the-art methods. The codes are available in here: <a class="link-external link-https" href="https://github.com/hygenie1228/ClothWild_RELEASE" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to reconstruct the 3D model of a clothed human body from a single image in a complex real - world environment. Although great progress has been made in 3D clothed human body reconstruction, most of the existing methods do not work well when dealing with images in real - world environments, which contain diverse human postures and appearances. This is mainly due to the large domain gap between the training data set and the real - world environment data set. The training data set is usually a synthetic data set, containing rendered images generated from 3D scans. However, the human postures included in these data sets are simple and the image appearances are not natural enough. In contrast, the data sets in the real - world environment are more diverse, which makes it difficult for existing methods to generalize on real - world environment images. To solve this problem, the paper proposes ClothWild, a 3D clothed human body reconstruction framework aiming to improve the robustness to real - world environment images. Specifically, ClothWild improves the robustness to the domain gap in the following two aspects: 1. **Weakly - supervised Pipeline**: A weakly - supervised pipeline that can be trained using 2D supervision targets of the real - world environment data set is proposed. 2. **DensePose - based Loss Function**: A DensePose - based loss function is designed to reduce the ambiguity in weak supervision. Through these methods, tests on multiple publicly available real - world environment data sets show that ClothWild produces more accurate and robust results than the existing state - of - the - art methods.