FashionDiff: A Controllable Diffusion Model Using Pairwise Fashion Elements for Intelligent Design
Han Yan,Haijun Zhang,Xiangyu Mu,Jicong Fan,Zhao Zhang
DOI: https://doi.org/10.1145/3581783.3612127
2023-01-01
Abstract:The process of fashion design involves creative expression through various methods, including sketch drawing, brush painting, and choices of textures and colors, all of which are employed to characterize the originality and uniqueness of the designed fashion items. Despite recent advances in intelligence-driven fashion design, the complexity of the diverse elements of a fashion item, such as its texture, color and shape, which are associated with the semantic information conveyed, continues to present challenges in terms of generating high-quality fashion images as well as achieving a controllable editing process. To address this issue, we propose a unified framework, FashionDiff, that leverages the diverse elements in fashion items to generate new items. Initially, we collected a large number of fashion images with multiple categories and created pairwise data in terms of sketch and additional data, such as brush areas, textures, or colors. To eliminate semantic discrepancies between these pairwise datasets, we introduce a feature modulation fusion (FMFusion) process, which enables interactive communication among different images, allowing them to be fused into latent spaces characterized by different resolutions. In order to produce high-quality editable fashion images, we develop a generator based on a state-of-the-art diffusion model called FD-ControlNet, which integrates latent spaces into different layers of the generator to generate ready-to-wear fashion items. Qualitative and quantitative experimental results demonstrate the effectiveness of our proposed method, and suggest that our model can offer flexible control over the generated images in terms of sketches, brush areas, textures, and colors.