Abstract:The fashion industry is increasingly leveraging computer vision and deep learning technologies to enhance online shopping experiences and operational efficiencies. In this paper, we address the challenge of generating high-fidelity tiled garment images essential for personalized recommendations, outfit composition, and virtual try-on systems from photos of garments worn by models. Inspired by the success of Latent Diffusion Models (LDMs) in image-to-image translation, we propose a novel approach utilizing a fine-tuned StableDiffusion model. Our method features a streamlined single-stage network design, which integrates garmentspecific masks to isolate and process target clothing items effectively. By simplifying the network architecture through selective training of transformer blocks and removing unnecessary crossattention layers, we significantly reduce computational complexity while achieving state-of-the-art performance on benchmark datasets like VITON-HD. Experimental results demonstrate the effectiveness of our approach in producing high-quality tiled garment images for both full-body and half-body inputs. Code and model are available at: <a class="link-external link-https" href="https://github.com/ixarchakos/try-off-anyone" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of generating high - quality, tiled clothing images from photos of models wearing clothes. Specifically, the authors propose a novel method that uses a fine - tuned StableDiffusion model to generate these tiled clothing images. This method is crucial for application scenarios such as personalized recommendation, collocation advice, and virtual fitting systems. #### Background and Challenges As the fashion industry increasingly adopts computer vision and deep - learning technologies to enhance the online shopping experience and operational efficiency, generating high - quality tiled clothing images has become particularly important. However, many current online shopping platforms only display photos of models wearing clothes and lack tiled views, which limits the improvement of user experience. Obtaining additional tiled images is both expensive and time - consuming, which is a major obstacle for retailers. #### Solutions To solve this problem, the authors propose the following innovations: 1. **Single - stage network design**: Simplify the network architecture. By selectively training Transformer blocks and removing unnecessary cross - attention layers, the computational complexity is significantly reduced. 2. **Clothing mask**: Introduce clothing - specific masks to isolate and process the target clothing item, thereby improving the generation quality. 3. **Based on the pre - trained StableDiffusion model**: Utilize the pre - trained StableDiffusion v1.5 model and fine - tune it to make it specifically optimized for generating high - fidelity tiled clothing images. 4. **Reduce trainable parameters**: By only training the Transformer blocks in U - Net, the trainable parameters are reduced from 815.45M to 267.24M, greatly reducing the memory requirements and computational resource consumption. #### Experimental Results The experimental results show that this method achieves state - of - the - art performance on the VITON - HD benchmark dataset and can generate high - quality tiled clothing images for full - body and half - body input images. In addition, the authors also conducted a detailed ablation study to verify the effectiveness of the method under different configurations and analyzed the influence of the number of seeds on the quality and consistency of the generated images. ### Summary In general, this paper solves a key technical problem in the fashion industry by proposing an efficient and high - quality method for generating tiled clothing images, providing strong support for personalized recommendation, collocation advice, and virtual fitting systems.

TryOffAnyone: Tiled Cloth Generation from a Dressed Person

TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models

OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Cloth Interactive Transformer for Virtual Try-On

Enhancing consistency in virtual try-on: A novel diffusion-based approach

Improving Diffusion Models for Virtual Try-on

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

ClothFit: Cloth-Human-Attribute Guided Virtual Try-On Network Using 3D Simulated Dataset

Toward Realistic Virtual Try-on Through Landmark Guided Shape Matching

Arbitrary Virtual Try-On Network: Characteristics Preservation and Trade-off between Body and Clothing

TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On

Improving Diffusion Models for Authentic Virtual Try-on in the Wild

ACDG-VTON: Accurate and Contained Diffusion Generation for Virtual Try-On

Dress-up: deep neural framework for image-based human appearance transfer

Improving Virtual Try-On with Garment-focused Diffusion Models

VTNCT: an Image-Based Virtual Try-on Network by Combining Feature with Pixel Transformation

OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person

Design2Cloth: 3D Cloth Generation from 2D Masks

A Two-stage Personalized Virtual Try-on Framework with Shape Control and Texture Guidance