TexControl: Sketch-Based Two-Stage Fashion Image Generation Using Diffusion Model

Yongming Zhang,Tianyu Zhang,Haoran Xie
2024-05-08
Abstract:Deep learning-based sketch-to-clothing image generation provides the initial designs and inspiration in the fashion design processes. However, clothing generation from freehand drawing is challenging due to the sparse and ambiguous information from the drawn sketches. The current generation models may have difficulty generating detailed texture information. In this work, we propose TexControl, a sketch-based fashion generation framework that uses a two-stage pipeline to generate the fashion image corresponding to the sketch input. First, we adopt ControlNet to generate the fashion image from sketch and keep the image outline stable. Then, we use an image-to-image method to optimize the detailed textures of the generated images and obtain the final results. The evaluation results show that TexControl can generate fashion images with high-quality texture as fine-grained image generation.
Computer Vision and Pattern Recognition,Graphics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **When generating high - quality fashion clothing images from hand - drawn sketches, how to ensure that the generated images have high - fidelity texture and material details?** Specifically, the existing deep - learning - based sketch - to - clothing - image generation methods face challenges when dealing with free - hand - drawn sketches because these sketches are sparse and blurry in information, resulting in the generated images being difficult to contain detailed texture information. ### Main problems: 1. **Sparse and blurry sketch information**: The information provided by hand - drawn sketches is limited, making it difficult to accurately constrain the generation process of the model. 2. **Poor texture quality of the generated image**: When generating clothing images, existing models have difficulty in generating high - quality textures and material details. 3. **Insufficient controllability of the generation results**: It is difficult for users to precisely control the specific texture and material properties of the generated images. ### Solutions: To solve the above problems, the paper proposes a two - stage framework named **TexControl** for generating high - quality fashion clothing images from hand - drawn sketches. The main features of this framework are as follows: 1. **First stage (basic generation stage)**: - Use the **ControlNet Scribble** model to generate an image contour preview from the input sketch, ensuring that the contour of the generated image is consistent with the input sketch. - By introducing text prompts as constraints, reduce the diversity of the generation results and ensure that the generated image meets the user's expectations. 2. **Second stage (texture control stage)**: - Use the **ControlNet ip2p** and **LDM** models to optimize the contour preview and generate the final image with detailed textures and specified materials. - Adjust the weights of the generation model through the model merge technique to further improve the quality of the generated image. ### Experimental results: The experimental results show that TexControl can generate clothing images with high - quality textures and can strictly follow the contour information of the input sketch during the generation process. Compared with the existing SOTA methods, TexControl performs better in generating complex materials and fine textures. ### Summary: By introducing a two - stage generation framework, the paper has successfully solved the challenges faced when generating high - quality fashion clothing images from hand - drawn sketches, especially making significant progress in texture generation and controllability.