TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On

Jiazheng Xing,Chao Xu,Yijie Qian,Yang Liu,Guang Dai,Baigui Sun,Yong Liu,Jingdong Wang
2024-04-01
Abstract:Virtual try-on focuses on adjusting the given clothes to fit a specific person seamlessly while avoiding any distortion of the patterns and textures of the garment. However, the clothing identity uncontrollability and training inefficiency of existing diffusion-based methods, which struggle to maintain the identity even with full parameter training, are significant limitations that hinder the widespread applications. In this work, we propose an effective and efficient framework, termed TryOn-Adapter. Specifically, we first decouple clothing identity into fine-grained factors: style for color and category information, texture for high-frequency details, and structure for smooth spatial adaptive transformation. Our approach utilizes a pre-trained exemplar-based diffusion model as the fundamental network, whose parameters are frozen except for the attention layers. We then customize three lightweight modules (Style Preserving, Texture Highlighting, and Structure Adapting) incorporated with fine-tuning techniques to enable precise and efficient identity control. Meanwhile, we introduce the training-free T-RePaint strategy to further enhance clothing identity preservation while maintaining the realistic try-on effect during the inference. Our experiments demonstrate that our approach achieves state-of-the-art performance on two widely-used benchmarks. Additionally, compared with recent full-tuning diffusion-based methods, we only use about half of their tunable parameters during training. The code will be made publicly available at
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper proposes a solution to the problems of uncontrollable clothing identity and inefficient training in virtual try-on. Existing diffusion model methods face difficulties in maintaining clothing identity, even with full parameter training. To address this, the paper introduces a framework called TryOn-Adapter. This framework decomposes clothing identity into three fine-grained factors: style (color and category information), texture (high-frequency details like patterns, logos, and text), and structure (smooth spatial adaptation transformations). It utilizes a pre-trained exemplar-based diffusion model as the base network and only trains the attention layers. TryOn-Adapter consists of three lightweight modules: style preservation, texture highlighting, and structure adaptation, combined with fine-tuning techniques to achieve precise and efficient clothing identity control. Additionally, the paper introduces the training-free T-RePaint strategy to enhance clothing identity preservation while maintaining realistic try-on effects. Compared to fully-trained diffusion model methods, TryOn-Adapter uses only approximately half the trainable parameters during training and achieves state-of-the-art performance on two widely used benchmarks. In this way, TryOn-Adapter improves the identity controllability and training efficiency of virtual try-on without sacrificing performance.