Abstract:Video Virtual Try-On aims to transfer a garment onto a person in the video. Previous methods typically focus on image-based virtual try-on, but directly applying these methods to videos often leads to temporal discontinuity due to inconsistencies between frames. Limited attempts in video virtual try-on also suffer from unrealistic results and poor generalization ability. In light of previous research, we posit that the task of video virtual try-on can be decomposed into two key aspects: (1) single-frame results are realistic and natural, while retaining consistency with the garment; (2) the person's actions and the garment are coherent throughout the entire video. To address these two aspects, we propose a novel two-stage framework based on Latent Diffusion Model, namely Garment-Preserving Diffusion for Video Virtual Try-On (GPD-VVTO). In the first stage, the model is trained on single-frame data to improve the ability of generating high-quality try-on images. We integrate both low-level texture features and high-level semantic features of the garment into the denoising network to preserve garment details while ensuring a natural fit between the garment and the person. In the second stage, the model is trained on video data to enhance temporal consistency. We devise a novel Garment-aware Temporal Attention (GTA) module that incorporates garment features into temporal attention, enabling the model to maintain the fidelity to the garment during temporal modeling. Furthermore, we collect a video virtual try-on dataset containing high-resolution videos from diverse scenes, addressing the limited variety of current datasets in terms of video background and human actions. Extensive experiments demonstrate that our method outperforms existing state-of-the-art methods in both image-based and video-based virtual try-on tasks, indicating the effectiveness of our proposed framework.

Improving Virtual Try-On with Garment-focused Diffusion Models

Improving Diffusion Models for Virtual Try-on

Improving Diffusion Models for Authentic Virtual Try-on in the Wild

Image Reference-guided Fashion Design with Structure-aware Transfer by Diffusion Models.

Enhancing consistency in virtual try-on: A novel diffusion-based approach

OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

ACDG-VTON: Accurate and Contained Diffusion Generation for Virtual Try-On

WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on

Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow

Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On

A Two-stage Personalized Virtual Try-on Framework with Shape Control and Texture Guidance

A 3D Virtual Try-On Method with Global-Local Alignment and Diffusion Model.

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models

Fashion-VDM: Video Diffusion Model for Virtual Try-On

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

ViViD: Video Virtual Try-on using Diffusion Models

IMAGDressing-v1: Customizable Virtual Dressing

GPD-VVTO: Preserving Garment Details in Video Virtual Try-On

FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on