DM-VTON: Distilled Mobile Real-time Virtual Try-On

Khoi-Nguyen Nguyen-Ngoc,Thanh-Tung Phan-Nguyen,Khanh-Duy Le,Tam V. Nguyen,Minh-Triet Tran,Trung-Nghia Le
2023-08-26
Abstract:The fashion e-commerce industry has witnessed significant growth in recent years, prompting exploring image-based virtual try-on techniques to incorporate Augmented Reality (AR) experiences into online shopping platforms. However, existing research has primarily overlooked a crucial aspect - the runtime of the underlying machine-learning model. While existing methods prioritize enhancing output quality, they often disregard the execution time, which restricts their applications on a limited range of devices. To address this gap, we propose Distilled Mobile Real-time Virtual Try-On (DM-VTON), a novel virtual try-on framework designed to achieve simplicity and efficiency. Our approach is based on a knowledge distillation scheme that leverages a strong Teacher network as supervision to guide a Student network without relying on human parsing. Notably, we introduce an efficient Mobile Generative Module within the Student network, significantly reducing the runtime while ensuring high-quality output. Additionally, we propose Virtual Try-on-guided Pose for Data Synthesis to address the limited pose variation observed in training images. Experimental results show that the proposed method can achieve 40 frames per second on a single Nvidia Tesla T4 GPU and only take up 37 MB of memory while producing almost the same output quality as other state-of-the-art methods. DM-VTON stands poised to facilitate the advancement of real-time AR applications, in addition to the generation of lifelike attired human figures tailored for diverse specialized training tasks. <a class="link-external link-https" href="https://sites.google.com/view/ltnghia/research/DMVTON" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the deficiencies in the execution speed and memory consumption of existing virtual try - on technologies. Specifically, although existing image - based virtual try - on methods can generate high - quality try - on results, they usually take a long time to process and occupy a large amount of memory resources, which limits the scope of use of these methods in real - time applications, especially on mobile devices. To solve these problems, the author proposes a new framework named Distilled Mobile Real - time Virtual Try - On (DM - VTON). This framework uses the knowledge distillation technique, with a powerful teacher network guiding the learning process of the student network, while the student network adopts a lightweight design to reduce the running time and memory consumption while maintaining high output quality. In addition, to address the problem of limited human pose changes in the training data, the author also proposes a data synthesis pipeline named Virtual Try - on - guided Pose for Data Synthesis (VTP - DS) to enrich the pose diversity in the training data. In short, DM - VTON aims to improve the real - time performance and resource efficiency of virtual try - on technology, making it more suitable for running on resource - constrained devices such as smart phones and tablets, thereby improving the user's online shopping experience.