PLTON: Product-Level Try-on with Realistic Clothes Shading and Wrinkles

Yanlong Zang,Han Yang,Jiaxu Miao,Yi Yang
DOI: https://doi.org/10.1109/ijcnn60899.2024.10650070
2024-01-01
Abstract:Image-based virtual try-on systems,which fit new garments onto humanportraits,are gaining research attention.An ideal pipeline should preserve thestatic features of clothes(like textures and logos)while also generatingdynamic elements(e.g.shadows,folds)that adapt to the model's pose andenvironment.Previous works fail specifically in generating dynamic features,asthey preserve the warped in-shop clothes trivially with predicted an alpha maskby composition.To break the dilemma of over-preserving and textures losses,wepropose a novel diffusion-based Product-level virtual try-on pipeline,PLTON, which can preserve the fine details of logos and embroideries whileproducing realistic clothes shading and wrinkles.The main insights are in threefolds:1)Adaptive Dynamic Rendering:We take a pre-trained diffusion model as agenerative prior and tame it with image features,training a dynamic extractorfrom scratch to generate dynamic tokens that preserve high-fidelity semanticinformation. Due to the strong generative power of the diffusion prior,we cangenerate realistic clothes shadows and wrinkles.2)Static CharacteristicsTransformation: High-frequency Map(HF-Map)is our fundamental insight for staticrepresentation.PLTON first warps in-shop clothes to the target model pose by atraditional warping network,and uses a high-pass filter to extract an HF-Mapfor preserving static cloth features.The HF-Map is used to generate modulationmaps through our static extractor,which are injected into a fixed U-net tosynthesize the final result.To enhance retention,a Two-stage Blended Denoisingmethod is proposed to guide the diffusion process for correct spatial layoutand color.PLTON is finetuned only with our collected small-size try-ondataset.Extensive quantitative and qualitative experiments on 1024 768 datasetsdemonstrate the superiority of our framework in mimicking real clothesdynamics.
What problem does this paper attempt to address?