Abstract:The fashion industry is at the brink of radical transformation. The emergence of Artificial Intelligence (AI) in fashion applications creates many opportunities for this industry and make fashion a better space for everyone. Interesting to this matter, we proposed a virtual try-on interface to stimulate consumers purchase intentions and facilitate their online buying decision process. Thus, we present, in this paper, our flexible person generation system for virtual try-on that aiming to treat the task of human appearance transfer across images while preserving texture details and structural coherence of the generated outfit. This challenging task has drawn increasing attention and made huge development of intelligent fashion applications. However, it requires different challenges, especially in the case of a wide divergences between the source and target images. To solve this problem, we proposed a flexible person generation framework called Dress-up to treat the 2D virtual try-on task. Dress-up is an end-to-end generation pipeline with three modules based on the task of image-to-image translation aiming to sequentially interchange garments between images, and produce dressing effects not achievable by existing works. The core idea of our solution is to explicitly encode the body pose and the target clothes by a pre-processing module based on the semantic segmentation process. Then, a conditional adversarial network is implemented to generate target segmentation feeding respectively, to the alignment and translation networks to generate the final output results. The novelty of this work lies in realizing the appearance transfer across images with high quality by reconstructing garments on a person in different orders and looks from simlpy semantic maps and 2D images without using 3D modeling. Our system can produce dressing effects and provide significant results over the state-of-the-art methods on the widely used DeepFashion dataset. Extensive evaluations show that Dress-up outperforms other recent methods in terms of output quality, and handles a wide range of editing functions for which there is no direct supervision. Different types of results were computed to verify the performance of our proposed framework and show that the robustness and effectiveness are high by utilizing our method.

Fashionformer: A Simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition.

Image Reference-guided Fashion Design with Structure-aware Transfer by Diffusion Models.

Deforming Garment Classification with Shallow Temporal Extraction and Tree-Based Fusion

MMFashion: An Open-Source Toolbox for Visual Fashion Analysis

FaceXFormer: A Unified Transformer for Facial Analysis

FashionViL: Fashion-Focused Vision-and-Language Representation Learning

DETR-based Layered Clothing Segmentation and Fine-Grained Attribute Recognition

Fusing Hierarchical Convolutional Features for Human Body Segmentation and Clothing Fashion Classification

Masked Vision-Language Transformer in Fashion

Fashion Meets Computer Vision

Dress-up: deep neural framework for image-based human appearance transfer

ClothSeg: semantic segmentation network with feature projection for clothing parsing

Two-Stream Multi-Task Network for Fashion Recognition

Garment4D: Garment Reconstruction from Point Cloud Sequences

A semantic segmentation algorithm for fashion images based on modified mask RCNN

UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation

HAIFIT: Human-to-AI Fashion Image Translation

DiffFashion: Reference-based Fashion Design with Structure-aware Transfer by Diffusion Models

AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction