Abstract:Image-based virtual try-on is challenging in fitting a target in-shop clothes into a reference person under diverse human poses. Previous works focus on preserving clothing details ( e.g., texture, logos, patterns ) when transferring desired clothes onto a target person under a fixed pose. However, the performances of existing methods significantly dropped when extending existing methods to multi-pose virtual try-on. In this paper, we propose an end-to-end Semantic Prediction Guidance multi-pose Virtual Try-On Network (SPG-VTON), which could fit the desired clothing into a reference person under arbitrary poses. Concretely, SPG-VTON is composed of three sub-modules. First, a Semantic Prediction Module (SPM) generates the desired semantic map. The predicted semantic map provides more abundant guidance to locate the desired clothes region and produce a coarse try-on image. Second, a Clothes Warping Module (CWM) warps in-shop clothes to the desired shape according to the predicted semantic map and the desired pose. Specifically, we introduce a conductible cycle consistency loss to alleviate the misalignment in the clothes warping process. Third, a Try-on Synthesis Module (TSM) combines the coarse result and the warped clothes to generate the final virtual try-on image, preserving details of the desired clothes and under the desired pose. Besides, we introduce a face identity loss to refine the facial appearance and maintain the identity of the final virtual try-on result at the same time. We evaluate the proposed method on the most massive multi-pose dataset (MPV) and the DeepFashion dataset. The qualitative and quantitative experiments show that SPG-VTON is superior to the state-of-the-art methods and is robust to the data noise, including background and accessory changes, i.e., hats and handbags, showing good scalability to the real-world scenario.

RMGN: A Regional Mask Guided Network for Parser-free Virtual Try-on

PF-VTON: Toward High-Quality Parser-Free Virtual Try-On Network

Toward Realistic Virtual Try-on Through Landmark Guided Shape Matching

PFDM: Parser-Free Virtual Try-on via Diffusion Model

PG-VTON: A Novel Image-Based Virtual Try-On Method Via Progressive Inference Paradigm

SPG-VTON: Semantic Prediction Guidance for Multi-pose Virtual Try-on

GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning

CS-VITON: a realistic virtual try-on network based on clothing region alignment and SPM

PEMF-VVTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm

VTON-MP: Multi-Pose Virtual Try-On Via Appearance Flow and Feature Filtering

IMAGDressing-v1: Customizable Virtual Dressing

Toward Detail-Oriented Image-Based Virtual Try-On with Arbitrary Poses

VTNCT: an Image-Based Virtual Try-on Network by Combining Feature with Pixel Transformation

VTON-HF: High Fidelity Virtual Try-on Network Via Semantic Adaptation

DP-VTON: Toward Detail-Preserving Image-Based Virtual Try-on Network

Do Not Mask What You Do Not Need to Mask: a Parser-Free Virtual Try-On

BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

Fast and robust virtual try-on based on parser-free generative adversarial network

DH-VTON: Deep Text-Driven Virtual Try-On via Hybrid Attention Learning

CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models

Multi-Pose Virtual Try-On Via Self-Adaptive Feature Filtering