Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images

Aiyu Cui,Jay Mahajan,Viraj Shah,Preeti Gomathinayagam,Chang Liu,Svetlana Lazebnik

2024-07-17

Abstract:Most virtual try-on research is motivated to serve the fashion business by generating images to demonstrate garments on studio models at a lower cost. However, virtual try-on should be a broader application that also allows customers to visualize garments on themselves using their own casual photos, known as in-the-wild try-on. Unfortunately, the existing methods, which achieve plausible results for studio try-on settings, perform poorly in the in-the-wild context. This is because these methods often require paired images (garment images paired with images of people wearing the same garment) for training. While such paired data is easy to collect from shopping websites for studio settings, it is difficult to obtain for in-the-wild scenes. In this work, we fill the gap by (1) introducing a StreetTryOn benchmark to support in-the-wild virtual try-on applications and (2) proposing a novel method to learn virtual try-on from a set of in-the-wild person images directly without requiring paired data. We tackle the unique challenges, including warping garments to more diverse human poses and rendering more complex backgrounds faithfully, by a novel DensePose warping correction method combined with diffusion-based conditional inpainting. Our experiments show competitive performance for standard studio try-on tasks and SOTA performance for street try-on and cross-domain try-on tasks.

Computer Vision and Pattern Recognition,Graphics

What problem does this paper attempt to address?

The paper aims to address the application of virtual try-on in real-world scenarios. Specifically, existing virtual try-on technologies are primarily focused on studio environments, relying on paired datasets (e.g., photos of models wearing specific clothing) and performing well in controlled settings. However, when attempting to apply these technologies to real-world images (i.e., unpaired, diverse backgrounds, and individual poses), existing methods perform poorly. The main contributions of the paper include: 1. **Introduction of a new benchmark dataset**: To evaluate the effectiveness of virtual try-on in real-world scenarios, the authors propose a new benchmark dataset—StreetTryOn. This dataset is filtered from the existing DeepFashion2 dataset and contains a large number of street photos, making it suitable for evaluating virtual try-on in real-world conditions. 2. **Proposing a new method**: The paper introduces a new method that can learn virtual try-on from unpaired real-world person images. This method utilizes DensePose for clothing deformation and combines it with a diffusion model for conditional refinement to address the challenges posed by diverse human poses and complex backgrounds. Through experimental validation, this method not only performs well on standard studio try-on tasks but also achieves significant results on more complex street try-on tasks (Shop2Street and Street2Street). Additionally, the method has advantages in handling limb reconstruction and background rendering, enabling high-quality virtual try-on without relying on paired data.

Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images

Toward Realistic Virtual Try-on Through Landmark Guided Shape Matching

Better Fit: Accommodate Variations in Clothing Types for Virtual Try-on

BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

Image-Based Virtual Try-On: A Survey

Arbitrary Virtual Try-On Network: Characteristics Preservation and Trade-off between Body and Clothing

Try-On-Adapter: A Simple and Flexible Try-On Paradigm

Virtually Trying on New Clothing with Arbitrary Poses

Virtual Try-On with Garment Self-Occlusion Conditions

Virtual Try-on Network with Attribute Transformation and Local Rendering.

FashionOn

PG-VTON: A Novel Image-Based Virtual Try-On Method Via Progressive Inference Paradigm

Virtual Try-on via Matching Relation with Landmark.

VTNCT: an Image-Based Virtual Try-on Network by Combining Feature with Pixel Transformation

PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns

LGVTON: A Landmark Guided Approach to Virtual Try-On

High-Resolution Virtual Try-On Network with Coarse-to-Fine Strategy

Toward Detail-Oriented Image-Based Virtual Try-On with Arbitrary Poses

Deep Learning in Virtual Try-On: A Comprehensive Survey

Template-Free Try-on Image Synthesis via Semantic-guided Optimization