FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on

Chenhui Wang,Tao Chen,Zhihao Chen,Zhizhong Huang,Taoran Jiang,Qi Wang,Hongming Shan

2024-05-19

Abstract:Despite their impressive generative performance, latent diffusion model-based virtual try-on (VTON) methods lack faithfulness to crucial details of the clothes, such as style, pattern, and text. To alleviate these issues caused by the diffusion stochastic nature and latent supervision, we propose a novel Faithful Latent Diffusion Model for VTON, termed FLDM-VTON. FLDM-VTON improves the conventional latent diffusion process in three major aspects. First, we propose incorporating warped clothes as both the starting point and local condition, supplying the model with faithful clothes priors. Second, we introduce a novel clothes flattening network to constrain generated try-on images, providing clothes-consistent faithful supervision. Third, we devise a clothes-posterior sampling for faithful inference, further enhancing the model performance over conventional clothes-agnostic Gaussian sampling. Extensive experimental results on the benchmark VITON-HD and Dress Code datasets demonstrate that our FLDM-VTON outperforms state-of-the-art baselines and is able to generate photo-realistic try-on images with faithful clothing details.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the issue of insufficient detail fidelity in the process of Virtual Try-On (VTON). Specifically, existing methods based on Latent Diffusion Models (LDM) can generate realistic high-resolution try-on images but perform poorly in preserving clothing details such as styles, patterns, and text. The main reasons are the randomness of LDM and the limitations of latent supervision. To mitigate these issues, the authors propose a new Faithful Latent Diffusion Model for Virtual Try-On (FLDM-VTON), with the following main improvements: 1. **Introducing deformed clothing as an initial point and local condition**: By utilizing the features of deformed clothing to provide prior information, the randomness in the sampling process is reduced. 2. **Introducing a clothing flattening network**: A new network is used to constrain the generated try-on images, ensuring clothing consistency and providing additional image-level constraints. 3. **Designing a clothing posterior sampling method**: The sampling strategy during inference is improved, further enhancing the model's performance. Experimental results show that FLDM-VTON outperforms existing baseline methods on the VITON-HD and Dress Code datasets, generating realistic try-on images while maintaining high fidelity of clothing details.

FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on

ACDG-VTON: Accurate and Contained Diffusion Generation for Virtual Try-On

Enhancing consistency in virtual try-on: A novel diffusion-based approach

OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow

Improving Virtual Try-On with Garment-focused Diffusion Models

D$^4$-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On

Improving Diffusion Models for Authentic Virtual Try-on in the Wild

Toward Realistic Virtual Try-on Through Landmark Guided Shape Matching

MV-VTON: Multi-View Virtual Try-On with Diffusion Models

Improving Diffusion Models for Virtual Try-on

WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on

DH-VTON: Deep Text-Driven Virtual Try-On via Hybrid Attention Learning

A Two-stage Personalized Virtual Try-on Framework with Shape Control and Texture Guidance

Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models

DP-VTON: Toward Detail-Preserving Image-Based Virtual Try-on Network

Fashion-VDM: Video Diffusion Model for Virtual Try-On