D$^4$-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On

Zhaotong Yang,Zicheng Jiang,Xinzhe Li,Huiyu Zhou,Junyu Dong,Huaidong Zhang,Yong Du

2024-07-21

Abstract:In this paper, we introduce D$^4$-VTON, an innovative solution for image-based virtual try-on. We address challenges from previous studies, such as semantic inconsistencies before and after garment warping, and reliance on static, annotation-driven clothing parsers. Additionally, we tackle the complexities in diffusion-based VTON models when handling simultaneous tasks like inpainting and denoising. Our approach utilizes two key technologies: Firstly, Dynamic Semantics Disentangling Modules (DSDMs) extract abstract semantic information from garments to create distinct local flows, improving precise garment warping in a self-discovered manner. Secondly, by integrating a Differential Information Tracking Path (DITP), we establish a novel diffusion-based VTON paradigm. This path captures differential information between incomplete try-on inputs and their complete versions, enabling the network to handle multiple degradations independently, thereby minimizing learning ambiguities and achieving realistic results with minimal overhead. Extensive experiments demonstrate that D$^4$-VTON significantly outperforms existing methods in both quantitative metrics and qualitative evaluations, demonstrating its capability in generating realistic images and ensuring semantic consistency.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address several key issues in the task of image-driven virtual try-on (VTON): 1. **Limitations in the Garment Deformation Stage**: - Current methods typically use Thin Plate Splines (TPS) or appearance flows for garment deformation. These techniques mainly focus on global alignment, neglecting local semantic changes, which leads to texture pattern distortion. - Although some methods mitigate this issue by segmenting garment regions, this approach relies on annotated data, making the training process time-consuming and difficult to define appropriate semantic regions. 2. **Complexity in the Synthesis Stage**: - Current methods usually employ Generative Adversarial Networks (GANs) or diffusion models for synthesis. GANs may produce unrealistic results, while diffusion models, although more stable, face optimization difficulties when handling tasks like denoising and inpainting simultaneously. - Existing methods often lack specific objectives for these tasks, resulting in ambiguity in learning the synthesis results. To address these issues, the paper proposes the D4-VTON model, which combines dynamic semantic disentangling techniques with a new paradigm based on a differential diffusion framework to achieve precise garment deformation and high-quality synthesis. Specifically, D4-VTON utilizes Dynamic Semantics Disentangling Modules (DSDMs) to independently learn local flows and introduces a Differential Information Tracking Path (DITP) to separate denoising and inpainting tasks, thereby reducing learning ambiguity and improving synthesis performance. Experimental results show that D4-VTON significantly outperforms existing methods in multiple benchmarks, demonstrating excellent performance in both quantitative and qualitative evaluations.

D$^4$-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On

ACDG-VTON: Accurate and Contained Diffusion Generation for Virtual Try-On

Improving Diffusion Models for Authentic Virtual Try-on in the Wild

Improving Diffusion Models for Virtual Try-on

DH-VTON: Deep Text-Driven Virtual Try-On via Hybrid Attention Learning

Enhancing consistency in virtual try-on: A novel diffusion-based approach

DP-VTON: Toward Detail-Preserving Image-Based Virtual Try-on Network

Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow

OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on

Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models

VTON-HF: High Fidelity Virtual Try-on Network Via Semantic Adaptation

WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

MV-VTON: Multi-View Virtual Try-On with Diffusion Models

Improving Virtual Try-On with Garment-focused Diffusion Models

Toward Realistic Virtual Try-on Through Landmark Guided Shape Matching

FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on

Self-Adaptive Clothing Mapping Based Virtual Try-on

CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models