Abstract:Generating personalized 3D avatars is crucial for AR/VR. However, recent text-to-3D methods that generate avatars for celebrities or fictional characters, struggle with everyday people. Methods for faithful reconstruction typically require full-body images in controlled settings. What if a user could just upload their personal "OOTD" (Outfit Of The Day) photo collection and get a faithful avatar in return? The challenge is that such casual photo collections contain diverse poses, challenging viewpoints, cropped views, and occlusion (albeit with a consistent outfit, accessories and hairstyle). We address this novel "Album2Human" task by developing PuzzleAvatar, a novel model that generates a faithful 3D avatar (in a canonical pose) from a personal OOTD album, while bypassing the challenging estimation of body and camera pose. To this end, we fine-tune a foundational vision-language model (VLM) on such photos, encoding the appearance, identity, garments, hairstyles, and accessories of a person into (separate) learned tokens and instilling these cues into the VLM. In effect, we exploit the learned tokens as "puzzle pieces" from which we assemble a faithful, personalized 3D avatar. Importantly, we can customize avatars by simply inter-changing tokens. As a benchmark for this new task, we collect a new dataset, called PuzzleIOI, with 41 subjects in a total of nearly 1K OOTD configurations, in challenging partial photos with paired ground-truth 3D bodies. Evaluation shows that PuzzleAvatar not only has high reconstruction accuracy, outperforming TeCH and MVDreamBooth, but also a unique scalability to album photos, and strong robustness. Our code and data are publicly available for research purpose at <a class="link-external link-https" href="https://puzzleavatar.is.tue.mpg.de/" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to generate a faithful and personalized 3D human body model from an individual's collection of daily photos. Specifically, the paper proposes a new method named PuzzleAvatar, aiming to utilize an individual's "Outfit Of The Day" (OOTD) photo sets. These photos usually contain images under different postures, viewing angles, backgrounds and lighting conditions, but with consistent clothing and hairstyles, to reconstruct a high - quality 3D human body model. This task is called "Album2Human", and its challenge lies in dealing with personal photos in an uncontrolled environment, while existing methods usually require full - body images or videos taken in a well - controlled environment. PuzzleAvatar bypasses the explicit estimation of human body pose and camera position by decomposing an individual's photos into multiple components (such as clothes, accessories, facial features, etc.) and encoding these parts into independent learning tokens. Then, these tokens can be recombined like puzzle pieces to generate a 3D model. This method not only improves the degree of personalization of the 3D model, but also enhances the adaptability and robustness of the model to different inputs. The main contributions of the paper include: 1. **Task Definition**: Introduce a new task - "Album2Human", that is, reconstructing a 3D human body model from an individual's photo set. 2. **Dataset Creation**: In order to evaluate this new task, a new dataset PuzzleIOI is created, which contains approximately 1,000 different dressing, accessory and hairstyle configurations for 41 objects. 3. **Method Innovation**: Propose the PuzzleAvatar method, adopting a new paradigm of "reconstruction as conditional generation", implicitly performing human body normalization through a personalized text - to - image model (T2I), avoiding explicit pose estimation or reprojection pixel loss. 4. **Performance Evaluation**: Through detailed experiments and ablation studies, analyze the effectiveness and scalability of PuzzleAvatar, and show its application potential in downstream tasks, such as character editing and virtual fitting. Overall, this paper provides a brand - new solution for generating high - quality 3D human body models from personal daily photos, which has important practical application value.

PuzzleAvatar: Assembling 3D Avatars from Personal Albums

AgileAvatar: Stylized 3D Avatar Creation via Cascaded Domain Bridging

TimeWalker: Personalized Neural Space for Lifelong Head Avatars

AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation

Personalizing human avatars based on realistic 3D facial reconstruction

HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images

MagicMirror: Fast and High-Quality Avatar Generation with a Constrained Search Space

Barbie: Text to Barbie-Style 3D Avatars

AvatarCLIP: zero-shot text-driven generation and animation of 3D avatars

AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text

DreamWaltz: Make a Scene with Complex 3D Animatable Avatars

AG3D: Learning to Generate 3D Avatars from 2D Image Collections

GETAvatar: Generative Textured Meshes for Animatable Human Avatars

Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition

3D Human Avatar Digitization from a Single Image.

HQ-Avatar: Towards High-Quality 3D Avatar Generation Via Point-based Representation

AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose

HQ3DAvatar: High Quality Controllable 3D Head Avatar

MVP-Human Dataset for 3D Human Avatar Reconstruction from Unconstrained Frames

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models