PuzzleAvatar: Assembling 3D Avatars from Personal Albums

Yuliang Xiu,Yufei Ye,Zhen Liu,Dimitrios Tzionas,Michael J. Black
2024-09-15
Abstract:Generating personalized 3D avatars is crucial for AR/VR. However, recent text-to-3D methods that generate avatars for celebrities or fictional characters, struggle with everyday people. Methods for faithful reconstruction typically require full-body images in controlled settings. What if a user could just upload their personal "OOTD" (Outfit Of The Day) photo collection and get a faithful avatar in return? The challenge is that such casual photo collections contain diverse poses, challenging viewpoints, cropped views, and occlusion (albeit with a consistent outfit, accessories and hairstyle). We address this novel "Album2Human" task by developing PuzzleAvatar, a novel model that generates a faithful 3D avatar (in a canonical pose) from a personal OOTD album, while bypassing the challenging estimation of body and camera pose. To this end, we fine-tune a foundational vision-language model (VLM) on such photos, encoding the appearance, identity, garments, hairstyles, and accessories of a person into (separate) learned tokens and instilling these cues into the VLM. In effect, we exploit the learned tokens as "puzzle pieces" from which we assemble a faithful, personalized 3D avatar. Importantly, we can customize avatars by simply inter-changing tokens. As a benchmark for this new task, we collect a new dataset, called PuzzleIOI, with 41 subjects in a total of nearly 1K OOTD configurations, in challenging partial photos with paired ground-truth 3D bodies. Evaluation shows that PuzzleAvatar not only has high reconstruction accuracy, outperforming TeCH and MVDreamBooth, but also a unique scalability to album photos, and strong robustness. Our code and data are publicly available for research purpose at <a class="link-external link-https" href="https://puzzleavatar.is.tue.mpg.de/" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition,Artificial Intelligence,Graphics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to generate a faithful and personalized 3D human body model from an individual's collection of daily photos. Specifically, the paper proposes a new method named PuzzleAvatar, aiming to utilize an individual's "Outfit Of The Day" (OOTD) photo sets. These photos usually contain images under different postures, viewing angles, backgrounds and lighting conditions, but with consistent clothing and hairstyles, to reconstruct a high - quality 3D human body model. This task is called "Album2Human", and its challenge lies in dealing with personal photos in an uncontrolled environment, while existing methods usually require full - body images or videos taken in a well - controlled environment. PuzzleAvatar bypasses the explicit estimation of human body pose and camera position by decomposing an individual's photos into multiple components (such as clothes, accessories, facial features, etc.) and encoding these parts into independent learning tokens. Then, these tokens can be recombined like puzzle pieces to generate a 3D model. This method not only improves the degree of personalization of the 3D model, but also enhances the adaptability and robustness of the model to different inputs. The main contributions of the paper include: 1. **Task Definition**: Introduce a new task - "Album2Human", that is, reconstructing a 3D human body model from an individual's photo set. 2. **Dataset Creation**: In order to evaluate this new task, a new dataset PuzzleIOI is created, which contains approximately 1,000 different dressing, accessory and hairstyle configurations for 41 objects. 3. **Method Innovation**: Propose the PuzzleAvatar method, adopting a new paradigm of "reconstruction as conditional generation", implicitly performing human body normalization through a personalized text - to - image model (T2I), avoiding explicit pose estimation or reprojection pixel loss. 4. **Performance Evaluation**: Through detailed experiments and ablation studies, analyze the effectiveness and scalability of PuzzleAvatar, and show its application potential in downstream tasks, such as character editing and virtual fitting. Overall, this paper provides a brand - new solution for generating high - quality 3D human body models from personal daily photos, which has important practical application value.