Abstract:Virtual try-on is a critical image synthesis task that aims to transfer clothes from one image to another while preserving the details of both humans and clothes. While many existing methods rely on Generative Adversarial Networks (GANs) to achieve this, flaws can still occur, particularly at high resolutions. Recently, the diffusion model has emerged as a promising alternative for generating high-quality images in various applications. However, simply using clothes as a condition for guiding the diffusion model to inpaint is insufficient to maintain the details of the clothes. To overcome this challenge, we propose an exemplar-based inpainting approach that leverages a warping module to guide the diffusion model's generation effectively. The warping module performs initial processing on the clothes, which helps to preserve the local details of the clothes. We then combine the warped clothes with clothes-agnostic person image and add noise as the input of diffusion model. Additionally, the warped clothes is used as local conditions for each denoising process to ensure that the resulting output retains as much detail as possible. Our approach, namely Diffusion-based Conditional Inpainting for Virtual Try-ON (DCI-VTON), effectively utilizes the power of the diffusion model, and the incorporation of the warping module helps to produce high-quality and realistic virtual try-on results. Experimental results on VITON-HD demonstrate the effectiveness and superiority of our method.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the issue of high-quality image synthesis in the task of Virtual Try-On. Specifically, the goal of virtual try-on is to transfer clothes from one image to a person in another image while preserving the details of both the clothes and the human body. Although many existing methods rely on Generative Adversarial Networks (GANs) to achieve this goal, there are still some shortcomings at high resolutions, such as loss of details and lack of realism. Recently, Diffusion Models have emerged as an alternative method for generating high-quality images. However, directly using clothes as a condition to guide the diffusion model for inpainting is insufficient to maintain the details of the clothes. To overcome this challenge, the authors propose an example-based inpainting method that utilizes a warping module to guide the effective generation of the diffusion model. The warping module initially processes the clothes, helping to retain the local details of the clothes. Then, the warped clothes are combined with the person image without clothes and noise is added as input to the diffusion model. Additionally, the warped clothes are used as local conditions in each denoising process to ensure that the output retains as much detail as possible. The main contributions of the paper include: 1. **Proposing a new framework**: Diffusion-based Conditional Inpainting for Virtual Try-ON (DCI-VTON), which effectively leverages the powerful generative capabilities of diffusion models. 2. **Introducing the warping module**: The warping module preprocesses the clothes to ensure the high quality and realism of the generated results. 3. **Experimental validation**: Experimental results on the VITON-HD dataset demonstrate the effectiveness and superiority of the proposed method. Through these innovations, the authors hope to generate high-quality and realistic composite images in the virtual try-on task, especially in high-resolution scenarios.

Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow

Improving Diffusion Models for Virtual Try-on

Improving Diffusion Models for Authentic Virtual Try-on in the Wild

Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On

ACDG-VTON: Accurate and Contained Diffusion Generation for Virtual Try-On

WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on

Enhancing consistency in virtual try-on: A novel diffusion-based approach

Improving Virtual Try-On with Garment-focused Diffusion Models

OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

A Two-stage Personalized Virtual Try-on Framework with Shape Control and Texture Guidance

Slot-VTON: Subject-Driven Diffusion-Based Virtual Try-on with Slot Attention

Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models

A 3D Virtual Try-On Method with Global-Local Alignment and Diffusion Model.

D$^4$-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model

FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on

Toward Realistic Virtual Try-on Through Landmark Guided Shape Matching

C2F-CFN: Coarse-to-Fine ClothFlow Network for High-Fidelity Virtual Try-On

DP-VTON: Toward Detail-Preserving Image-Based Virtual Try-on Network