Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design

Jia Yu,Lichao Zhang,Zijie Chen,Fayu Pan,MiaoMiao Wen,Yuming Yan,Fangsheng Weng,Shuai Zhang,Lili Pan,Zhenzhong Lan
2024-03-18
Abstract:The fusion of AI and fashion design has emerged as a promising research area. However, the lack of extensive, interrelated data on clothing and try-on stages has hindered the full potential of AI in this domain. Addressing this, we present the Fashion-Diffusion dataset, a product of multiple years' rigorous effort. This dataset, the first of its kind, comprises over a million high-quality fashion images, paired with detailed text descriptions. Sourced from a diverse range of geographical locations and cultural backgrounds, the dataset encapsulates global fashion trends. The images have been meticulously annotated with fine-grained attributes related to clothing and humans, simplifying the fashion design process into a Text-to-Image (T2I) task. The Fashion-Diffusion dataset not only provides high-quality text-image pairs and diverse human-garment pairs but also serves as a large-scale resource about humans, thereby facilitating research in T2I generation. Moreover, to foster standardization in the T2I-based fashion design field, we propose a new benchmark comprising multiple datasets for evaluating the performance of fashion design models. This work represents a significant leap forward in the realm of AI-driven fashion design, setting a new standard for future research in this field.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the problem of the lack of large - scale, high - quality image datasets with detailed paired text descriptions in the field of the integration of artificial intelligence and fashion design. Specifically: 1. **Dataset scale and quality**: - **Limitations of existing datasets**: Existing fashion image datasets such as Prada and DeepFashion - MM contain some images and text descriptions, but they are small in scale (less than 100,000 images) and lack detailed fine - grained attribute descriptions. The deficiencies of these datasets limit the ability to train high - quality fashion design models. - **Requirement for new datasets**: In order to fully realize the potential of artificial intelligence in the field of fashion design, a dataset that contains a large number of high - resolution, high - quality images and is paired with detailed text descriptions is required. 2. **Diversity and standardization of datasets**: - **Diversity**: Existing datasets often come from specific geographical locations and cultural backgrounds and lack global representation. Therefore, a dataset that contains images and text descriptions from different geographical locations and cultural backgrounds is required. - **Standardization**: In order to promote the standardization of text - to - image (T2I) - based fashion design research, a new benchmark test set is required to evaluate the performance of fashion design models. ### Solutions To solve the above problems, the authors propose the Fashion - Diffusion dataset, which has the following characteristics: 1. **Large - scale**: It contains more than 1 million high - resolution (768×1152) high - quality fashion images. 2. **Detailed text descriptions**: Each image is accompanied by detailed text descriptions covering the fine - grained attributes of clothing and the human body. 3. **Global representation**: The sources of the images are extensive, covering different geographical locations and cultural backgrounds, reflecting global fashion trends. 4. **Diversity**: It contains human body images of various ages, genders and races, as well as clothing with 52 fine - grained classifications. 5. **High quality**: The correlation between the images and text descriptions is high, and the CLIPScore reaches 0.83. 6. **Benchmark test**: A new set of benchmark test sets is proposed, which contains multiple subsets and is used to evaluate the performance of fashion design models. ### Experimental results The experimental results show that the Fashion - Diffusion dataset is superior to existing datasets in both quality and quantity. Specific indicators are as follows: - **FID**: 8.33 (compared with 15.32 of Prada) - **IS**: 6.95 (compared with 4.7 of Prada) - **CLIPScore**: 0.83 (compared with 0.70 of Prada) These results indicate that the Fashion - Diffusion dataset not only performs well in image quality but also has significant advantages in generating fine - grained attributes. By fine - tuning the current T2I models (such as Stable Diffusion) on this dataset, the generation performance of the models can be significantly improved.