Abstract:This study discusses the critical issues of Virtual Try-On in contemporary e-commerce and the prospective metaverse, emphasizing the challenges of preserving intricate texture details and distinctive features of the target person and the clothes in various scenarios, such as clothing texture and identity characteristics like tattoos or accessories. In addition to the fidelity of the synthesized images, the efficiency of the synthesis process presents a significant hurdle. Various existing approaches are explored, highlighting the limitations and unresolved aspects, e.g., identity information omission, uncontrollable artifacts, and low synthesis speed. It then proposes a novel diffusion-based solution that addresses garment texture preservation and user identity retention during virtual try-on. The proposed network comprises two primary modules - a warping module aligning clothing with individual features and a try-on module refining the attire and generating missing parts integrated with a mask-aware post-processing technique ensuring the integrity of the individual's identity. It demonstrates impressive results, surpassing the state-of-the-art in speed by nearly 20 times during inference, with superior fidelity in qualitative assessments. Quantitative evaluations confirm comparable performance with the recent SOTA method on the VITON-HD and Dresscode datasets. We named our model Fast and Identity Preservation Virtual TryON (FIP-VITON).

What problem does this paper attempt to address?

This paper attempts to solve several key problems in virtual try - on, specifically including: 1. **Texture Detail and Feature Preservation**: During the virtual try - on process, how to preserve the complex texture details of clothing and the unique features (such as appearance and posture) of the target person in different scenarios. In particular, how to keep the clothing pattern unchanged when adapting to different body types, especially in cases where there are large changes in body posture or shape. 2. **Fidelity of the Composite Image**: Besides maintaining texture and features, the quality of the composite image is also an important challenge. Existing methods often perform poorly in maintaining clothing textures and symbols and are prone to producing uncontrollable artifacts. 3. **Efficiency of the Synthesis Process**: Besides image quality, the efficiency of the synthesis process is also a major obstacle. Existing methods are insufficient in generation speed, affecting the user experience. ### Main Contributions of the Paper 1. **Proposed a New Virtual Try - on Technology**: This technology can generate realistic results in multiple scenarios while maintaining the texture details of clothing and the identification features of users. 2. **Introduced a Time - efficient Diffusion Model**: Through the effective guidance of the conditional module, this model can not only adjust and maintain clothing details but also generate missing body parts, thus achieving satisfactory results in the generation process. 3. **Introduced a Mask - based Post - processing Technique**: This technique can not only maintain the identification features of users but also improve the overall fidelity of the generated image. ### Method Overview The method proposed in the paper contains two main modules: the **Alignment Module** and the **Try - on Module**, combined with post - processing techniques. - **Alignment Module**: This module is responsible for aligning clothing with personal features. It takes into account specific information about clothing and person - related information, including key points, dense pose images, and points of interest in specific areas (such as upper body, lower body, or full body). - **The Try - on Module**: This module refines the aligned clothing and generates missing parts in the image. The generated image then undergoes a conditional post - processing technique, called mask - aware technology, to ensure the basic integrity of personal identity. ### Experimental Results The paper conducted experiments on multiple datasets, including the VITON - HD and DressCode datasets. The experimental results show that this method is nearly 20 times faster in generation speed than the current state - of - the - art methods, while showing higher fidelity in qualitative evaluation. Quantitative evaluation also confirms that the performance of this method on the VITON - HD and DressCode datasets is comparable to that of the most recent state - of - the - art methods. ### Conclusion By proposing a new diffusion model and a mask - aware post - processing technique, this paper successfully solves the problems of texture detail preservation, user identity retention, and generation efficiency in virtual try - on, providing a new direction for the development of virtual try - on technology.

Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models

Improving Diffusion Models for Virtual Try-on

Improving Diffusion Models for Authentic Virtual Try-on in the Wild

Toward Realistic Virtual Try-on Through Landmark Guided Shape Matching

Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On

WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on

MT-VTON: Multilevel Transformation-Based Virtual Try-On for Enhancing Realism of Clothing

A Two-stage Personalized Virtual Try-on Framework with Shape Control and Texture Guidance

StyleVTON: A multi-pose virtual try-on with identity and clothing detail preservation

DP-VTON: Toward Detail-Preserving Image-Based Virtual Try-on Network

ACDG-VTON: Accurate and Contained Diffusion Generation for Virtual Try-On

VTNCT: an Image-Based Virtual Try-on Network by Combining Feature with Pixel Transformation

TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow

Fashion-VDM: Video Diffusion Model for Virtual Try-On

VTON-SCFA: A Virtual Try-On Network Based on the Semantic Constraints and Flow Alignment

TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On

Enhancing consistency in virtual try-on: A novel diffusion-based approach

VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization