ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild

Chen Guo,Tianjian Jiang,Manuel Kaufmann,Chengwei Zheng,Julien Valentin,Jie Song,Otmar Hilliges

2024-09-29

Abstract:While previous years have seen great progress in the 3D reconstruction of humans from monocular videos, few of the state-of-the-art methods are able to handle loose garments that exhibit large non-rigid surface deformations during articulation. This limits the application of such methods to humans that are dressed in standard pants or T-shirts. Our method, ReLoo, overcomes this limitation and reconstructs high-quality 3D models of humans dressed in loose garments from monocular in-the-wild videos. To tackle this problem, we first establish a layered neural human representation that decomposes clothed humans into a neural inner body and outer clothing. On top of the layered neural representation, we further introduce a non-hierarchical virtual bone deformation module for the clothing layer that can freely move, which allows the accurate recovery of non-rigidly deforming loose clothing. A global optimization jointly optimizes the shape, appearance, and deformations of the human body and clothing via multi-layer differentiable volume rendering. To evaluate ReLoo, we record subjects with dynamically deforming garments in a multi-view capture studio. This evaluation, both on existing and our novel dataset, demonstrates ReLoo's clear superiority over prior art on both indoor datasets and in-the-wild videos.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to reconstruct a 3D model of humans wearing loose clothing in monocular videos. Although great progress has been made in recent years in reconstructing 3D models of humans from monocular videos, most existing methods have difficulty dealing with loose clothing that exhibits large non - rigid surface deformations during movement. This limits the application scope of these methods, mainly to humans wearing standard pants or T - shirts. For this reason, the paper proposes a new method - ReLoo, which aims to overcome this limitation and be able to reconstruct high - quality 3D models of humans wearing loose clothing from monocular videos in the wild. Specifically, ReLoo solves the problem through the following core concepts: 1. **Hierarchical Neural Human Representation**: Decompose humans wearing clothing into two layers: the neural internal body and the external clothing. 2. **Virtual Skeleton Deformation Module**: Based on the hierarchical neural representation, a non - hierarchical virtual skeleton deformation module is introduced, allowing the clothing layer to move freely, thereby accurately restoring highly dynamic loose clothing. 3. **Global Optimization**: Through multi - layer differentiable volume rendering, jointly optimize the shape, appearance, and deformation of the entire human body and clothing. Through these innovations, ReLoo can provide higher - quality and temporally consistent 3D reconstruction results when dealing with highly dynamic loose clothing. The paper verifies the superior performance of ReLoo in indoor datasets and wild videos through experiments, and shows its significant advantages in detail restoration and overall quality.

ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild

SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video

DLCA-Recon: Dynamic Loose Clothing Avatar Reconstruction from Monocular Videos

DressRecon: Freeform 4D Human Reconstruction from Monocular Video

MonoClothCap: Towards Temporally Coherent Clothing Capture from Monocular RGB Video

High-Quality Animatable Dynamic Garment Reconstruction from Monocular Videos

ReN Human: Learning Relightable Neural Implicit Surfaces for Animatable Human Rendering

IF-Garments: Reconstructing Your Intersection-Free Multi-Layered Garments from Monocular Videos

DreaMo: Articulated 3D Reconstruction From A Single Casual Video

Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis

MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video

PERGAMO: Personalized 3D Garments from Monocular Video

REACTO: Reconstructing Articulated Objects from a Single Video

Innovative AI techniques for photorealistic 3D clothed human reconstruction from monocular images or videos: a survey

Temporally Coherent Full 3D Mesh Human Pose Recovery from Monocular Video

4DRecons: 4D Neural Implicit Deformable Objects Reconstruction from a single RGB-D Camera with Geometrical and Topological Regularizations

GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details

Reality's Canvas, Language's Brush: Crafting 3D Avatars from Monocular Video

Deep Physics-aware Inference of Cloth Deformation for Monocular Human Performance Capture

MoDA: Modeling Deformable 3D Objects from Casual Videos

LiveCap: Real-time Human Performance Capture from Monocular Video