Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models

Phuong Dam,Jihoon Jeong,Anh Tran,Daeyoung Kim
2024-07-17
Abstract:This study discusses the critical issues of Virtual Try-On in contemporary e-commerce and the prospective metaverse, emphasizing the challenges of preserving intricate texture details and distinctive features of the target person and the clothes in various scenarios, such as clothing texture and identity characteristics like tattoos or accessories. In addition to the fidelity of the synthesized images, the efficiency of the synthesis process presents a significant hurdle. Various existing approaches are explored, highlighting the limitations and unresolved aspects, e.g., identity information omission, uncontrollable artifacts, and low synthesis speed. It then proposes a novel diffusion-based solution that addresses garment texture preservation and user identity retention during virtual try-on. The proposed network comprises two primary modules - a warping module aligning clothing with individual features and a try-on module refining the attire and generating missing parts integrated with a mask-aware post-processing technique ensuring the integrity of the individual's identity. It demonstrates impressive results, surpassing the state-of-the-art in speed by nearly 20 times during inference, with superior fidelity in qualitative assessments. Quantitative evaluations confirm comparable performance with the recent SOTA method on the VITON-HD and Dresscode datasets. We named our model Fast and Identity Preservation Virtual TryON (FIP-VITON).
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve several key problems in virtual try - on, specifically including: 1. **Texture Detail and Feature Preservation**: During the virtual try - on process, how to preserve the complex texture details of clothing and the unique features (such as appearance and posture) of the target person in different scenarios. In particular, how to keep the clothing pattern unchanged when adapting to different body types, especially in cases where there are large changes in body posture or shape. 2. **Fidelity of the Composite Image**: Besides maintaining texture and features, the quality of the composite image is also an important challenge. Existing methods often perform poorly in maintaining clothing textures and symbols and are prone to producing uncontrollable artifacts. 3. **Efficiency of the Synthesis Process**: Besides image quality, the efficiency of the synthesis process is also a major obstacle. Existing methods are insufficient in generation speed, affecting the user experience. ### Main Contributions of the Paper 1. **Proposed a New Virtual Try - on Technology**: This technology can generate realistic results in multiple scenarios while maintaining the texture details of clothing and the identification features of users. 2. **Introduced a Time - efficient Diffusion Model**: Through the effective guidance of the conditional module, this model can not only adjust and maintain clothing details but also generate missing body parts, thus achieving satisfactory results in the generation process. 3. **Introduced a Mask - based Post - processing Technique**: This technique can not only maintain the identification features of users but also improve the overall fidelity of the generated image. ### Method Overview The method proposed in the paper contains two main modules: the **Alignment Module** and the **Try - on Module**, combined with post - processing techniques. - **Alignment Module**: This module is responsible for aligning clothing with personal features. It takes into account specific information about clothing and person - related information, including key points, dense pose images, and points of interest in specific areas (such as upper body, lower body, or full body). - **The Try - on Module**: This module refines the aligned clothing and generates missing parts in the image. The generated image then undergoes a conditional post - processing technique, called mask - aware technology, to ensure the basic integrity of personal identity. ### Experimental Results The paper conducted experiments on multiple datasets, including the VITON - HD and DressCode datasets. The experimental results show that this method is nearly 20 times faster in generation speed than the current state - of - the - art methods, while showing higher fidelity in qualitative evaluation. Quantitative evaluation also confirms that the performance of this method on the VITON - HD and DressCode datasets is comparable to that of the most recent state - of - the - art methods. ### Conclusion By proposing a new diffusion model and a mask - aware post - processing technique, this paper successfully solves the problems of texture detail preservation, user identity retention, and generation efficiency in virtual try - on, providing a new direction for the development of virtual try - on technology.