DiCTI: Diffusion-based Clothing Designer via Text-guided Input

Ajda Lampe,Julija Stopar,Deepak Kumar Jain,Shinichiro Omachi,Peter Peer,Vitomir Štruc
2024-07-04
Abstract:Recent developments in deep generative models have opened up a wide range of opportunities for image synthesis, leading to significant changes in various creative fields, including the fashion industry. While numerous methods have been proposed to benefit buyers, particularly in virtual try-on applications, there has been relatively less focus on facilitating fast prototyping for designers and customers seeking to order new designs. To address this gap, we introduce DiCTI (Diffusion-based Clothing Designer via Text-guided Input), a straightforward yet highly effective approach that allows designers to quickly visualize fashion-related ideas using text inputs only. Given an image of a person and a description of the desired garments as input, DiCTI automatically generates multiple high-resolution, photorealistic images that capture the expressed semantics. By leveraging a powerful diffusion-based inpainting model conditioned on text inputs, DiCTI is able to synthesize convincing, high-quality images with varied clothing designs that viably follow the provided text descriptions, while being able to process very diverse and challenging inputs, captured in completely unconstrained settings. We evaluate DiCTI in comprehensive experiments on two different datasets (VITON-HD and Fashionpedia) and in comparison to the state-of-the-art (SoTa). The results of our experiments show that DiCTI convincingly outperforms the SoTA competitor in generating higher quality images with more elaborate garments and superior text prompt adherence, both according to standard quantitative evaluation measures and human ratings, generated as part of a user study.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the communication issues between designers and clients in rapid prototyping, especially when customizing new clothing designs. While many current methods focus on enhancing consumer experience through applications like virtual try-ons, there is relatively less attention on the need for designers to quickly visualize fashion concepts. To fill this gap, the authors propose DiCTI (Diffusion-based Text-guided Clothing Design Model), which allows designers to quickly generate high-quality, realistic clothing images using only text input. Specifically, the goals of DiCTI are: 1. **Rapid Visualization of Fashion Concepts**: Designers can quickly generate multiple high-resolution, realistic clothing images through simple text descriptions. 2. **Improving Design Efficiency**: By automating the generation process, it reduces the time and effort required by designers in the initial design phase. 3. **Enhancing User Engagement**: Consumers can describe their needs through text, communicate more effectively with designers, or search for similar designs on the internet. ### Solution Overview DiCTI utilizes pre-trained diffusion models and text-guided techniques to achieve the generation of high-quality images from text descriptions. The main steps include: 1. **Mask Generation Module**: Generates binary masks for the body and face to guide subsequent image editing. 2. **Clothing Synthesis Module**: Generates new clothing designs in the masked areas based on text descriptions. 3. **Identity Preservation Module**: Ensures that facial features in the generated images remain consistent with the original image. ### Experiments and Results The authors conducted comprehensive experiments on two different datasets (VITON-HD and Fashionpedia) and compared DiCTI with existing state-of-the-art methods (such as FICE). The experimental results show that DiCTI outperforms FICE in terms of image quality and text description consistency. Specifically, it excels in the following aspects: - **Image Quality**: DiCTI generates more realistic and detailed images. - **Text Description Consistency**: DiCTI better follows text descriptions to generate the required clothing designs. - **Identity Preservation**: DiCTI performs excellently in preserving facial features, especially in terms of skin tone consistency. ### Conclusion DiCTI provides designers and consumers with an efficient and intuitive tool to quickly generate high-quality clothing design images. Through text-guided image editing, DiCTI not only improves design efficiency but also enhances user engagement and satisfaction.