A Method for Training-free Person Image Picture Generation

Tianyu Chen
2023-05-17
Abstract:The current state-of-the-art Diffusion model has demonstrated excellent results in generating images. However, the images are monotonous and are mostly the result of the distribution of images of people in the training set, making it challenging to generate multiple images for a fixed number of individuals. This problem can often only be solved by fine-tuning the training of the model. This means that each individual/animated character image must be trained if it is to be drawn, and the hardware and cost of this training is often beyond the reach of the average user, who accounts for the largest number of people. To solve this problem, the Character Image Feature Encoder model proposed in this paper enables the user to use the process by simply providing a picture of the character to make the image of the character in the generated image match the expectation. In addition, various details can be adjusted during the process using prompts. Unlike traditional Image-to-Image models, the Character Image Feature Encoder extracts only the relevant image features, rather than information about the model's composition or movements. In addition, the Character Image Feature Encoder can be adapted to different models after training. The proposed model can be conveniently incorporated into the Stable Diffusion generation process without modifying the model's ontology or used in combination with Stable Diffusion as a joint model.
Computer Vision and Pattern Recognition,Artificial Intelligence,Graphics,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issue of requiring specialized training when generating images of specific characters. Specifically, existing diffusion models (such as Stable Diffusion) can generate high-quality images, but when generating multiple images of the same character, the similarity between these images is not high. Typically, specialized fine-tuning of the model is required to achieve this. This fine-tuning is not only time-consuming and labor-intensive but also requires high hardware costs, making it difficult for ordinary users to achieve. To solve this problem, the authors propose a new model called "Character Image Feature Encoder" (abbreviated as CIFE or Character Encoder). This model allows users to generate character images that are highly similar in appearance to the original image by simply providing a picture of the character and some descriptive prompts, without the need for additional training processes. This enables ordinary users to utilize AI generation technology to create character images in various different scenarios while maintaining the consistency of the character's features.