FaceStudio: Put Your Face Everywhere in Seconds

Yuxuan Yan,Chi Zhang,Rui Wang,Yichao Zhou,Gege Zhang,Pei Cheng,Gang Yu,Bin Fu
2023-12-06
Abstract:This study investigates identity-preserving image synthesis, an intriguing task in image generation that seeks to maintain a subject's identity while adding a personalized, stylistic touch. Traditional methods, such as Textual Inversion and DreamBooth, have made strides in custom image creation, but they come with significant drawbacks. These include the need for extensive resources and time for fine-tuning, as well as the requirement for multiple reference images. To overcome these challenges, our research introduces a novel approach to identity-preserving synthesis, with a particular focus on human images. Our model leverages a direct feed-forward mechanism, circumventing the need for intensive fine-tuning, thereby facilitating quick and efficient image generation. Central to our innovation is a hybrid guidance framework, which combines stylized images, facial images, and textual prompts to guide the image generation process. This unique combination enables our model to produce a variety of applications, such as artistic portraits and identity-blended images. Our experimental results, including both qualitative and quantitative evaluations, demonstrate the superiority of our method over existing baseline models and previous works, particularly in its remarkable efficiency and ability to preserve the subject's identity with high fidelity.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily focuses on addressing the issue of preserving individual identities in image generation tasks, particularly in the context of synthesizing images with multiple characters. Specifically, the research targets the following points: 1. **Identity-preserving image synthesis**: The paper proposes a novel approach to tackle the challenges present in traditional text-to-image generation models, which struggle to accurately capture and express the identity features of specific individuals based solely on text descriptions, especially when dealing with complex facial details. 2. **Efficiency and resource consumption**: Existing methods like DreamBooth and Textual Inversion, while capable of customizing generated images, require substantial computational resources and time for fine-tuning, and typically need multiple reference images to achieve satisfactory results. 3. **Multi-identity image synthesis**: When synthesizing images with multiple characters having different identities, current techniques often find it difficult to accurately associate each identity with its corresponding character region. To address the above issues, the authors propose a framework named FaceStudio, with its core contributions including: - **Hybrid guidance strategy**: This strategy combines style images, facial images, and text prompts to guide the image generation process, achieving fast and efficient image generation through a direct feedforward mechanism rather than cumbersome fine-tuning steps. - **Multi-identity cross-attention mechanism**: To effectively handle images containing multiple identities, the authors developed a multi-identity cross-attention mechanism, enabling the model to accurately map guidance information from different identities to specific character regions in the image. - **Experimental validation**: The paper provides comprehensive experimental results, including qualitative and quantitative analyses, to demonstrate the superiority of the proposed model over baseline models and existing works, particularly excelling in terms of efficiency. In summary, this research aims to achieve efficient and high-quality identity-preserving image synthesis, especially suitable for human images, by introducing a hybrid guidance image generation framework that requires no fine-tuning.