Stable Diffusion Reference Only: Image Prompt and Blueprint Jointly Guided Multi-Condition Diffusion Model for Secondary Painting

Hao Ai,Lu Sheng
2023-11-04
Abstract:Stable Diffusion and ControlNet have achieved excellent results in the field of image generation and synthesis. However, due to the granularity and method of its control, the efficiency improvement is limited for professional artistic creations such as comics and animation production whose main work is secondary painting. In the current workflow, fixing characters and image styles often need lengthy text prompts, and even requires further training through TextualInversion, DreamBooth or other methods, which is very complicated and expensive for painters. Therefore, we present a new method in this paper, Stable Diffusion Reference Only, a images-to-image self-supervised model that uses only two types of conditional images for precise control generation to accelerate secondary painting. The first type of conditional image serves as an image prompt, supplying the necessary conceptual and color information for generation. The second type is blueprint image, which controls the visual structure of the generated image. It is natively embedded into the original UNet, eliminating the need for ControlNet. We released all the code for the module and pipeline, and trained a controllable character line art coloring model at <a class="link-external link-https" href="https://github.com/aihao2000/stable-diffusion-reference-only" rel="external noopener nofollow">this https URL</a>, that achieved state-of-the-art results in this field. This verifies the effectiveness of the structure and greatly improves the production efficiency of animations, comics, and fanworks.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily addresses the issues present in the field of secondary creation (Secondary Painting) in animation, comics, and fan art creation, and proposes a new solution. Existing text-guided image generation technologies (such as Stable Diffusion and ControlNet) have achieved significant results in image generation, but they have limitations in the professional art creation field, especially in comic and animation production. Specifically, these issues include: 1. **Complexity and Cost Issues**: To generate images of specific characters or styles, complex text prompts are often required, and additional methods (such as Textual Inversion, DreamBooth, etc.) may be needed for training, which increases the workload and cost for artists. 2. **Limitations of Precise Control**: Current methods find it difficult to directly extract concepts from new images and apply them to the online generation process. Describing specific characters or image styles is often challenging to express clearly in words. To address the above issues, the paper proposes a new method called "Stable Diffusion Reference Only." This is a self-supervised model that can achieve precise control over the generated images with only two types of conditional images, thereby accelerating the secondary creation process. These two types of conditional images are: - **Image Prompt**: Provides the concept and color information needed for the generated image. For example, it can be a character design sheet. - **Blueprint Image**: Controls the visual structure of the generated image. It is similar to the conditional image in ControlNet but does not require the same resource cost and additional training. By embedding these two types of conditional images into the original UNet architecture, it is possible to generate new images with specific styles and characters without additional training. This method greatly simplifies the workflow and improves the efficiency of creating animation, comics, and fan art. In summary, the paper aims to address the limitations of existing text-based image generation technologies in the field of secondary creation by introducing a new multi-condition diffusion model, enabling artists to create more efficiently.