StableIdentity: Inserting Anybody into Anywhere at First Sight

Qinghe Wang,Xu Jia,Xiaomin Li,Taiqing Li,Liqian Ma,Yunzhi Zhuge,Huchuan Lu

2024-01-29

Abstract:Recent advances in large pretrained text-to-image models have shown unprecedented capabilities for high-quality human-centric generation, however, customizing face identity is still an intractable problem. Existing methods cannot ensure stable identity preservation and flexible editability, even with several images for each subject during training. In this work, we propose StableIdentity, which allows identity-consistent recontextualization with just one face image. More specifically, we employ a face encoder with an identity prior to encode the input face, and then land the face representation into a space with an editable prior, which is constructed from celeb names. By incorporating identity prior and editability prior, the learned identity can be injected anywhere with various contexts. In addition, we design a masked two-phase diffusion loss to boost the pixel-level perception of the input face and maintain the diversity of generation. Extensive experiments demonstrate our method outperforms previous customization methods. In addition, the learned identity can be flexibly combined with the off-the-shelf modules such as ControlNet. Notably, to the best knowledge, we are the first to directly inject the identity learned from a single image into video/3D generation without finetuning. We believe that the proposed StableIdentity is an important step to unify image, video, and 3D customized generation models.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper primarily addresses several key issues in the task of customized generation, particularly in the one-shot training setting for human faces. Specifically, the research aims to achieve the following goals: 1. **Stable Facial Identity Preservation**: Existing methods often fail to consistently maintain the identity features of the input face across different contexts. 2. **Flexible Editability**: Even when multiple images per person are used during training, existing methods cannot ensure stable identity preservation and flexible editing. 3. **Efficiency and Practicality**: Some methods require long optimization times or large datasets to train a general encoder, making it difficult to capture unique identity details. To address these issues, the paper proposes a method named StableIdentity. This method allows for identity-consistent recontextualization using only a single facial image. Specifically, StableIdentity employs a pre-trained facial recognition model as a face encoder to capture identity representations and utilizes celebrity names to construct an editable identity distribution space. Additionally, the paper designs a masked two-phase diffusion loss to enhance pixel-level perception of the input face and learn more stable facial identity features. In summary, the goal of this research is to improve the flexibility and efficiency of customized generation while ensuring the stability of identity features. StableIdentity is not only suitable for image-level customized generation but can also be seamlessly integrated with video and 3D generation models without the need for additional fine-tuning steps.

StableIdentity: Inserting Anybody into Anywhere at First Sight

FaceChain: A Playground for Identity-Preserving Portrait Generation

FaceSwapNet: Landmark Guided Many-to-Many Face Reenactment

Realistic Face Reenactment Via Self-Supervised Disentangling of Identity and Pose

PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation

FaceStudio: Put Your Face Everywhere in Seconds

Inserting Anybody in Diffusion Models via Celeb Basis

DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation

StableSwap: Stable Face Swapping in a Shared and Controllable Latent Space

Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis

CapHuman: Capture Your Moments in Parallel Universes

StableAnimator: High-Quality Identity-Preserving Human Image Animation

InstantID: Zero-shot Identity-Preserving Generation in Seconds

Magic-Me: Identity-Specific Video Customized Diffusion

ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Beyond Inserting: Learning Identity Embedding for Semantic-Fidelity Personalized Diffusion Generation

FlashFace: Human Image Personalization with High-fidelity Identity Preservation

FaceChain: A Playground for Human-centric Artificial Intelligence Generated Content

Towards Open-Set Identity Preserving Face Synthesis