Abstract:Latest advances have achieved realistic virtual try-on (VTON) through localized garment inpainting using latent diffusion models, significantly enhancing consumers' online shopping experience. However, existing VTON technologies neglect the need for merchants to showcase garments comprehensively, including flexible control over garments, optional faces, poses, and scenes. To address this issue, we define a virtual dressing (VD) task focused on generating freely editable human images with fixed garments and optional conditions. Meanwhile, we design a comprehensive affinity metric index (CAMI) to evaluate the consistency between generated images and reference garments. Then, we propose IMAGDressing-v1, which incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE. We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet, ensuring users can control different scenes through text. IMAGDressing-v1 can be combined with other extension plugins, such as ControlNet and IP-Adapter, to enhance the diversity and controllability of generated images. Furthermore, to address the lack of data, we release the interactive garment pairing (IGPair) dataset, containing over 300,000 pairs of clothing and dressed images, and establish a standard pipeline for data assembly. Extensive experiments demonstrate that our IMAGDressing-v1 achieves state-of-the-art human image synthesis performance under various controlled conditions. The code and model will be available at <a class="link-external link-https" href="https://github.com/muzishen/IMAGDressing" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that the existing Virtual Try - On (VTON) technology lacks flexibility and editing capabilities when presenting clothing, and cannot meet the needs of merchants to comprehensively display clothing. Specifically, the existing VTON technology mainly focuses on the local image inpainting task under given clothing and fixed human body conditions. Although this improves the online shopping experience of consumers, it ignores the various details that merchants need to flexibly control in clothing display, such as different faces, postures, and scenes. To make up for this deficiency, the paper defines a new virtual try - on task (Virtual Dressing, VD), aiming to generate freely editable portraits with fixed clothing and optional conditions, thereby providing a more comprehensive and personalized clothing display. ### Main Contributions 1. **Defined a new virtual try - on task (VD)**: In response to the needs of merchants, the task of generating freely editable portraits with fixed clothing and optional conditions is defined. 2. **Designed a comprehensive affinity metric (CAMI)**: Used to evaluate the consistency between the generated image and the reference clothing. 3. **Proposed the IMAGDressing - v1 model**: Combines a trainable clothing UNet and a frozen denoising UNet, and integrates clothing features and text - prompt control through a hybrid attention mechanism. 4. **Released a large - scale interactive clothing - pairing dataset (IGPair)**: Contains more than 300,000 pairs of clothing and wearing images, supporting community research. ### Solutions - **IMAGDressing - v1 model**: - **Clothing UNet**: Extracts semantic features from CLIP and texture features from VAE to capture fine - grained clothing features. - **Denoising UNet**: Integrates clothing features and text prompts through a hybrid attention mechanism to achieve scene control. - **Hybrid attention module**: Combines frozen self - attention and trainable cross - attention to balance the influence of clothing features and text prompts. - **Dataset**: - **IGPair dataset**: Contains high - resolution images, diverse scenes and styles, and detailed text descriptions, meeting the requirements of the VD task. ### Experimental Results - **Quantitative results**: IMAGDressing - v1 outperforms the existing SOTA methods in multiple evaluation metrics, especially in the comprehensive affinity metric (CAMI). - **Qualitative results**: IMAGDressing - v1 can not only faithfully reproduce text prompts, but also retain fine - grained clothing details, demonstrating superior performance in the VD task. ### Summary This paper solves the limitations of the existing VTON technology in merchant applications by defining a new virtual try - on task (VD) and proposing the IMAGDressing - v1 model, providing a more comprehensive and flexible clothing display solution. At the same time, the released IGPair dataset provides rich resources for related research.

IMAGDressing-v1: Customizable Virtual Dressing

AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models

PG-VTON: A Novel Image-Based Virtual Try-On Method Via Progressive Inference Paradigm

Toward Accurate and Realistic Outfits Visualization with Attention to Details

Enhancing consistency in virtual try-on: A novel diffusion-based approach

Toward Realistic Virtual Try-on Through Landmark Guided Shape Matching

DH-VTON: Deep Text-Driven Virtual Try-On via Hybrid Attention Learning

Improving Virtual Try-On with Garment-focused Diffusion Models

Improving Diffusion Models for Authentic Virtual Try-on in the Wild

PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

Improving Diffusion Models for Virtual Try-on

VTNCT: an Image-Based Virtual Try-on Network by Combining Feature with Pixel Transformation

OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Fashion-VDM: Video Diffusion Model for Virtual Try-On

CS-VITON: a realistic virtual try-on network based on clothing region alignment and SPM

ViViD: Video Virtual Try-on using Diffusion Models

Arbitrary Virtual Try-On Network: Characteristics Preservation and Trade-off between Body and Clothing

OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person

DP-VTON: Toward Detail-Preserving Image-Based Virtual Try-on Network

M&M VTO: Multi-Garment Virtual Try-On and Editing