A Method for Training-free Person Image Picture Generation

Tianyu Chen

2023-05-17

Abstract:The current state-of-the-art Diffusion model has demonstrated excellent results in generating images. However, the images are monotonous and are mostly the result of the distribution of images of people in the training set, making it challenging to generate multiple images for a fixed number of individuals. This problem can often only be solved by fine-tuning the training of the model. This means that each individual/animated character image must be trained if it is to be drawn, and the hardware and cost of this training is often beyond the reach of the average user, who accounts for the largest number of people. To solve this problem, the Character Image Feature Encoder model proposed in this paper enables the user to use the process by simply providing a picture of the character to make the image of the character in the generated image match the expectation. In addition, various details can be adjusted during the process using prompts. Unlike traditional Image-to-Image models, the Character Image Feature Encoder extracts only the relevant image features, rather than information about the model's composition or movements. In addition, the Character Image Feature Encoder can be adapted to different models after training. The proposed model can be conveniently incorporated into the Stable Diffusion generation process without modifying the model's ontology or used in combination with Stable Diffusion as a joint model.

Computer Vision and Pattern Recognition,Artificial Intelligence,Graphics,Machine Learning

What problem does this paper attempt to address?

The paper aims to address the issue of requiring specialized training when generating images of specific characters. Specifically, existing diffusion models (such as Stable Diffusion) can generate high-quality images, but when generating multiple images of the same character, the similarity between these images is not high. Typically, specialized fine-tuning of the model is required to achieve this. This fine-tuning is not only time-consuming and labor-intensive but also requires high hardware costs, making it difficult for ordinary users to achieve. To solve this problem, the authors propose a new model called "Character Image Feature Encoder" (abbreviated as CIFE or Character Encoder). This model allows users to generate character images that are highly similar in appearance to the original image by simply providing a picture of the character and some descriptive prompts, without the need for additional training processes. This enables ordinary users to utilize AI generation technology to create character images in various different scenarios while maintaining the consistency of the character's features.

A Method for Training-free Person Image Picture Generation

Emage: Non-Autoregressive Text-to-Image Generation

Novel 3D-Aware Composition Images Synthesis for Object Display with Diffusion Model.

High-fidelity Person-centric Subject-to-Image Synthesis

Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization

Improving face generation quality and prompt following with synthetic captions

Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models

Training-Free Sketch-Guided Diffusion with Latent Optimization

Portrait Diffusion: Training-free Face Stylization with Chain-of-Painting

TKG-DM: Training-free Chroma Key Content Generation Diffusion Model

JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation

Customization Assistant for Text-to-image Generation

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

FreeTuner: Any Subject in Any Style with Training-free Diffusion

Prompt-Free Diffusion: Taking "text" out of Text-to-Image Diffusion Models

Adaptively Controllable Diffusion Model for Efficient Conditional Image Generation

DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance

Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

Training-free Composite Scene Generation for Layout-to-Image Synthesis

Diffusion-HPC: Generating Synthetic Images with Realistic Humans

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation