Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images

Aiyu Cui,Jay Mahajan,Viraj Shah,Preeti Gomathinayagam,Chang Liu,Svetlana Lazebnik
2024-07-17
Abstract:Most virtual try-on research is motivated to serve the fashion business by generating images to demonstrate garments on studio models at a lower cost. However, virtual try-on should be a broader application that also allows customers to visualize garments on themselves using their own casual photos, known as in-the-wild try-on. Unfortunately, the existing methods, which achieve plausible results for studio try-on settings, perform poorly in the in-the-wild context. This is because these methods often require paired images (garment images paired with images of people wearing the same garment) for training. While such paired data is easy to collect from shopping websites for studio settings, it is difficult to obtain for in-the-wild scenes. In this work, we fill the gap by (1) introducing a StreetTryOn benchmark to support in-the-wild virtual try-on applications and (2) proposing a novel method to learn virtual try-on from a set of in-the-wild person images directly without requiring paired data. We tackle the unique challenges, including warping garments to more diverse human poses and rendering more complex backgrounds faithfully, by a novel DensePose warping correction method combined with diffusion-based conditional inpainting. Our experiments show competitive performance for standard studio try-on tasks and SOTA performance for street try-on and cross-domain try-on tasks.
Computer Vision and Pattern Recognition,Graphics
What problem does this paper attempt to address?
The paper aims to address the application of virtual try-on in real-world scenarios. Specifically, existing virtual try-on technologies are primarily focused on studio environments, relying on paired datasets (e.g., photos of models wearing specific clothing) and performing well in controlled settings. However, when attempting to apply these technologies to real-world images (i.e., unpaired, diverse backgrounds, and individual poses), existing methods perform poorly. The main contributions of the paper include: 1. **Introduction of a new benchmark dataset**: To evaluate the effectiveness of virtual try-on in real-world scenarios, the authors propose a new benchmark dataset—StreetTryOn. This dataset is filtered from the existing DeepFashion2 dataset and contains a large number of street photos, making it suitable for evaluating virtual try-on in real-world conditions. 2. **Proposing a new method**: The paper introduces a new method that can learn virtual try-on from unpaired real-world person images. This method utilizes DensePose for clothing deformation and combines it with a diffusion model for conditional refinement to address the challenges posed by diverse human poses and complex backgrounds. Through experimental validation, this method not only performs well on standard studio try-on tasks but also achieves significant results on more complex street try-on tasks (Shop2Street and Street2Street). Additionally, the method has advantages in handling limb reconstruction and background rendering, enabling high-quality virtual try-on without relying on paired data.