CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model

Jianhao Zeng,Dan Song,Weizhi Nie,Hongshuo Tian,Tongtong Wang,Anan Liu
2024-04-26
Abstract:Generative Adversarial Networks (GANs) dominate the research field in image-based virtual try-on, but have not resolved problems such as unnatural deformation of garments and the blurry generation quality. While the generative quality of diffusion models is impressive, achieving controllability poses a significant challenge when applying it to virtual try-on and multiple denoising iterations limit its potential for real-time applications. In this paper, we propose Controllable Accelerated virtual Try-on with Diffusion Model (CAT-DM). To enhance the controllability, a basic diffusion-based virtual try-on network is designed, which utilizes ControlNet to introduce additional control conditions and improves the feature extraction of garment images. In terms of acceleration, CAT-DM initiates a reverse denoising process with an implicit distribution generated by a pre-trained GAN-based model. Compared with previous try-on methods based on diffusion models, CAT-DM not only retains the pattern and texture details of the inshop garment but also reduces the sampling steps without compromising generation quality. Extensive experiments demonstrate the superiority of CAT-DM against both GANbased and diffusion-based methods in producing more realistic images and accurately reproducing garment patterns.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address two key challenges in the virtual try-on task: 1. **Generation Quality**: Existing methods based on Generative Adversarial Networks (GANs) often produce unnatural distortions when dealing with complex poses, and the generated image quality is blurry, failing to retain the details of the clothing (such as patterns and textures). Although methods based on diffusion models have improved in terms of generation quality, they still face challenges in controlling the generated results, especially in preserving the complex textures and patterns of the target clothing. 2. **Generation Speed**: Diffusion models require a large number of sampling steps to generate high-quality images, which limits their application in real-time virtual try-on scenarios. Therefore, how to accelerate the sampling process while ensuring generation quality has become an important research direction. To address these challenges, the paper proposes a model named CAT-DM (Controllable Accelerated virtual Try-on with Diffusion Model). CAT-DM addresses the above issues through the following two key techniques: - **Garment-Conditioned Diffusion Model (GC-DM) to Enhance Controllability**: By introducing the ControlNet architecture, additional control conditions are provided to improve the feature extraction of clothing images, thereby enhancing the control ability over the clothing region during the generation process. - **Truncation-Based Acceleration Strategy**: A pre-trained GAN model is used to generate the initial try-on image, and noise is added to this image as the starting point for the reverse denoising process, significantly reducing the required sampling steps and thus accelerating the generation speed. Through these innovations, CAT-DM not only surpasses existing GAN and diffusion model methods in terms of generation quality but also achieves significant improvements in generation speed, making it more suitable for real-time virtual try-on applications.