Abstract:This study investigates identity-preserving image synthesis, an intriguing task in image generation that seeks to maintain a subject's identity while adding a personalized, stylistic touch. Traditional methods, such as Textual Inversion and DreamBooth, have made strides in custom image creation, but they come with significant drawbacks. These include the need for extensive resources and time for fine-tuning, as well as the requirement for multiple reference images. To overcome these challenges, our research introduces a novel approach to identity-preserving synthesis, with a particular focus on human images. Our model leverages a direct feed-forward mechanism, circumventing the need for intensive fine-tuning, thereby facilitating quick and efficient image generation. Central to our innovation is a hybrid guidance framework, which combines stylized images, facial images, and textual prompts to guide the image generation process. This unique combination enables our model to produce a variety of applications, such as artistic portraits and identity-blended images. Our experimental results, including both qualitative and quantitative evaluations, demonstrate the superiority of our method over existing baseline models and previous works, particularly in its remarkable efficiency and ability to preserve the subject's identity with high fidelity.

What problem does this paper attempt to address?

The paper primarily focuses on addressing the issue of preserving individual identities in image generation tasks, particularly in the context of synthesizing images with multiple characters. Specifically, the research targets the following points: 1. **Identity-preserving image synthesis**: The paper proposes a novel approach to tackle the challenges present in traditional text-to-image generation models, which struggle to accurately capture and express the identity features of specific individuals based solely on text descriptions, especially when dealing with complex facial details. 2. **Efficiency and resource consumption**: Existing methods like DreamBooth and Textual Inversion, while capable of customizing generated images, require substantial computational resources and time for fine-tuning, and typically need multiple reference images to achieve satisfactory results. 3. **Multi-identity image synthesis**: When synthesizing images with multiple characters having different identities, current techniques often find it difficult to accurately associate each identity with its corresponding character region. To address the above issues, the authors propose a framework named FaceStudio, with its core contributions including: - **Hybrid guidance strategy**: This strategy combines style images, facial images, and text prompts to guide the image generation process, achieving fast and efficient image generation through a direct feedforward mechanism rather than cumbersome fine-tuning steps. - **Multi-identity cross-attention mechanism**: To effectively handle images containing multiple identities, the authors developed a multi-identity cross-attention mechanism, enabling the model to accurately map guidance information from different identities to specific character regions in the image. - **Experimental validation**: The paper provides comprehensive experimental results, including qualitative and quantitative analyses, to demonstrate the superiority of the proposed model over baseline models and existing works, particularly excelling in terms of efficiency. In summary, this research aims to achieve efficient and high-quality identity-preserving image synthesis, especially suitable for human images, by introducing a hybrid guidance image generation framework that requires no fine-tuning.

FaceStudio: Put Your Face Everywhere in Seconds

FaceChain: A Playground for Identity-Preserving Portrait Generation

FlashFace: Human Image Personalization with High-fidelity Identity Preservation

StableIdentity: Inserting Anybody into Anywhere at First Sight

3DFaceShop: Explicitly Controllable 3D-Aware Portrait Generation

FaceChain: A Playground for Human-centric Artificial Intelligence Generated Content

InstantID: Zero-shot Identity-Preserving Generation in Seconds

Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

Towards Open-Set Identity Preserving Face Synthesis

CapHuman: Capture Your Moments in Parallel Universes

Few-shots Portrait Generation with Style Enhancement and Identity Preservation

Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis

IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis

Composition-Aided Face Photo-Sketch Synthesis.

Sketch realizing: Lifelike portrait synthesis from sketch

Quality Guided Sketch-to-Photo Image Synthesis

AnyFace: Free-style Text-to-Face Synthesis and Manipulation

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Text Guided Person Image Synthesis