Abstract:We tackle the task of street-to-shop clothing image synthesis. Given a daily person image with a particular clothing item captured in the street scenario, we aim to synthesize the frontal facing view of that item in the shop scenario. This problem has the following challenges: 1) the distinct visual discrepancy between the street and shop scenario; 2) the severe shape deformation of clothing in the presence of an arbitrary human pose; 3) the preservation of fine-grained details during the process of clothing image generation. In this paper, we jointly solve these difficulties by proposing a Pose-Normalized and Appearance-Preserved Generative Adversarial Network (PNAP-GAN). More specifically, conditioned on the clothing-agnostic representation (i.e., clothing landmarks and semantic parsing map), we disentangle the shape and appearance synthesis in a coarse-to-fine framework. Moreover, a semantic embedding loss is introduced to guide the domain transfer in the semantic level (i.e., keeping the clothing attributes). With the synthesized frontal shop image, a pose-normalized representation in complementary to the domain-invariant feature learnt from the original street image are integrated to facilitate the problem of street-to-shop clothing retrieval. Extensive experiments conducted demonstrate the effectiveness of the proposed PNAP-GAN on generating high quality frontal-view images and the excellence of the learnt pose-normalized features on the retrieval task than existing methods. In addition, we demonstrate that the pose-normalized retrieval feature benefits the cross-scenario (i.e., street-to-shop) clothing image generation in a semantic-preserved manner.

Toward Multi-Modal Conditioned Fashion Image Translation.

Multimodal Image-to-Image Translation via Mutual Information Estimation and Maximization

Image Reference-guided Fashion Design with Structure-aware Transfer by Diffusion Models.

A one-to-many conditional generative adversarial network framework for multiple image-to-image translations

PAINT: Photo-realistic Fashion Design Synthesis

Poly-GAN: Multi-Conditioned GAN for Fashion Synthesis

Photo-realistic Image Synthesis from Lines and Appearance with Modular Modulation

Multimodal Conditional Image Synthesis with Product-of-Experts GANs

Toward Multimodal Image-to-Image Translation

Harnessing the Conditioning Sensorium for Improved Image Translation

Pose-Normalized and Appearance-Preserved Street-to-Shop Clothing Image Generation and Feature Learning

M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis via Non-Autoregressive Generative Transformers

Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing

Conditional Image-to-Image Translation

Verbal-Person Nets: Pose-Guided Multi-Granularity Language-to-Person Generation

Multimodal Face Synthesis From Visual Attributes

Towards Photo-Realistic Virtual Try-On by Adaptively Generating$\leftrightarrow$Preserving Image Content

Multi-Pose Virtual Try-On Via Self-Adaptive Feature Filtering

Pose- and Attribute-consistent Person Image Synthesis

Pose with Style: Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN

UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis