ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild

Chen Guo,Tianjian Jiang,Manuel Kaufmann,Chengwei Zheng,Julien Valentin,Jie Song,Otmar Hilliges
2024-09-29
Abstract:While previous years have seen great progress in the 3D reconstruction of humans from monocular videos, few of the state-of-the-art methods are able to handle loose garments that exhibit large non-rigid surface deformations during articulation. This limits the application of such methods to humans that are dressed in standard pants or T-shirts. Our method, ReLoo, overcomes this limitation and reconstructs high-quality 3D models of humans dressed in loose garments from monocular in-the-wild videos. To tackle this problem, we first establish a layered neural human representation that decomposes clothed humans into a neural inner body and outer clothing. On top of the layered neural representation, we further introduce a non-hierarchical virtual bone deformation module for the clothing layer that can freely move, which allows the accurate recovery of non-rigidly deforming loose clothing. A global optimization jointly optimizes the shape, appearance, and deformations of the human body and clothing via multi-layer differentiable volume rendering. To evaluate ReLoo, we record subjects with dynamically deforming garments in a multi-view capture studio. This evaluation, both on existing and our novel dataset, demonstrates ReLoo's clear superiority over prior art on both indoor datasets and in-the-wild videos.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to reconstruct a 3D model of humans wearing loose clothing in monocular videos. Although great progress has been made in recent years in reconstructing 3D models of humans from monocular videos, most existing methods have difficulty dealing with loose clothing that exhibits large non - rigid surface deformations during movement. This limits the application scope of these methods, mainly to humans wearing standard pants or T - shirts. For this reason, the paper proposes a new method - ReLoo, which aims to overcome this limitation and be able to reconstruct high - quality 3D models of humans wearing loose clothing from monocular videos in the wild. Specifically, ReLoo solves the problem through the following core concepts: 1. **Hierarchical Neural Human Representation**: Decompose humans wearing clothing into two layers: the neural internal body and the external clothing. 2. **Virtual Skeleton Deformation Module**: Based on the hierarchical neural representation, a non - hierarchical virtual skeleton deformation module is introduced, allowing the clothing layer to move freely, thereby accurately restoring highly dynamic loose clothing. 3. **Global Optimization**: Through multi - layer differentiable volume rendering, jointly optimize the shape, appearance, and deformation of the entire human body and clothing. Through these innovations, ReLoo can provide higher - quality and temporally consistent 3D reconstruction results when dealing with highly dynamic loose clothing. The paper verifies the superior performance of ReLoo in indoor datasets and wild videos through experiments, and shows its significant advantages in detail restoration and overall quality.