FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion

Abhishek Kumar Singh,Ioannis Patras

2024-04-26

Abstract:The rapid evolution of the fashion industry increasingly intersects with technological advancements, particularly through the integration of generative AI. This study introduces a novel generative pipeline designed to transform the fashion design process by employing latent diffusion models. Utilizing ControlNet and LoRA fine-tuning, our approach generates high-quality images from multimodal inputs such as text and sketches. We leverage and enhance state-of-the-art virtual try-on datasets, including Multimodal Dress Code and VITON-HD, by integrating sketch data. Our evaluation, utilizing metrics like FID, CLIP Score, and KID, demonstrates that our model significantly outperforms traditional stable diffusion models. The results not only highlight the effectiveness of our model in generating fashion-appropriate outputs but also underscore the potential of diffusion models in revolutionizing fashion design workflows. This research paves the way for more interactive, personalized, and technologically enriched methodologies in fashion design and representation, bridging the gap between creative vision and practical application.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

The paper proposes a solution to the problem of innovative design in the fashion industry, specifically helping designers transform their ideas into images. The researchers introduce a new pipeline based on a latent diffusion model, combining ControlNet and LoRA fine-tuning, capable of generating high-quality images based on text descriptions and sketches. They utilize the virtual try-on dataset Multimodal Dress Code and VITON-HD, and extend these datasets to include sketches. By conducting comparative experiments and evaluating metrics such as FID, CLIP Score, and KID, the paper demonstrates that the proposed model outperforms traditional stable diffusion models in generating detailed and realistic clothing images that match the input conditions. This approach has the potential to enhance interactivity, personalization, and technical value in fashion design, making it suitable for applications such as automated design. In summary, the main contributions of the paper include: 1. Implementation of a novel pipeline based on stable diffusion, LoRA, and ControlNet for fashion clothing generation guided by multimodal inputs such as text and sketches. 2. Introduction of a new generation model tailored for fashion designers, utilizing a latent diffusion model for conditional modeling. 3. Expansion of the virtual try-on dataset by adding sketch information and proposing a new evaluation metric to measure the structural similarity between generated images and input sketches. This work builds upon existing research in fields like text-to-image synthesis, sketch-based image generation, and ControlNet, bringing new technological advancements to the fashion design industry.

FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion

Image Reference-guided Fashion Design with Structure-aware Transfer by Diffusion Models.

Interactive Fashion Content Generation Using LLMs and Latent Diffusion Models

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

New Fashion: Personalized 3D Design with a Single Sketch Input

Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing

Enhancing consistency in virtual try-on: A novel diffusion-based approach

DiCTI: Diffusion-based Clothing Designer via Text-guided Input

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

An intelligent generative method of fashion design combining attribute knowledge and Stable Diffusion Model

Harnessing Multimodal AI for Creative Design: Performance Evaluation of Stable Diffusion and DALL-E 3 in Fashion Apparel and Typography

DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment

FashionFlow: Leveraging Diffusion Models for Dynamic Fashion Video Synthesis from Static Imagery

Improving Virtual Try-On with Garment-focused Diffusion Models

FashionMorph: Contextually Adaptive Clothing Replacement with CLIP, Segmentation, and Stable Diffusion

Improving Diffusion Models for Virtual Try-on

Fashion-VDM: Video Diffusion Model for Virtual Try-On

AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models

DressCode: Autoregressively Sewing and Generating Garments from Text Guidance

Improving Diffusion Models for Authentic Virtual Try-on in the Wild