Abstract:Diffusion-based 2D virtual try-on (VTON) techniques have recently demonstrated strong performance, while the development of 3D VTON has largely lagged behind. Despite recent advances in text-guided 3D scene editing, integrating 2D VTON into these pipelines to achieve vivid 3D VTON remains challenging. The reasons are twofold. First, text prompts cannot provide sufficient details in describing clothing. Second, 2D VTON results generated from different viewpoints of the same 3D scene lack coherence and spatial relationships, hence frequently leading to appearance inconsistencies and geometric distortions. To resolve these problems, we introduce an image-prompted 3D VTON method (dubbed GS-VTON) which, by leveraging 3D Gaussian Splatting (3DGS) as the 3D representation, enables the transfer of pre-trained knowledge from 2D VTON models to 3D while improving cross-view consistency. (1) Specifically, we propose a personalized diffusion model that utilizes low-rank adaptation (LoRA) fine-tuning to incorporate personalized information into pre-trained 2D VTON models. To achieve effective LoRA training, we introduce a reference-driven image editing approach that enables the simultaneous editing of multi-view images while ensuring consistency. (2) Furthermore, we propose a persona-aware 3DGS editing framework to facilitate effective editing while maintaining consistent cross-view appearance and high-quality 3D geometry. (3) Additionally, we have established a new 3D VTON benchmark, 3D-VTONBench, which facilitates comprehensive qualitative and quantitative 3D VTON evaluations. Through extensive experiments and comparative analyses with existing methods, the proposed \OM has demonstrated superior fidelity and advanced editing capabilities, affirming its effectiveness for 3D VTON.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve several key problems in 3D virtual try - on (VTON) technology: 1. **Extension from 2D VTON to 3D VTON**: - Although 2D VTON technology based on diffusion models has made significant progress, the development of 3D VTON is relatively lagging. Existing 3D VTON methods face challenges in generating multi - view - consistent images and accurately modeling the 3D representation of clothing. 2. **Multi - view consistency**: - When using text prompts for 3D scene editing, it is unable to provide sufficient details to describe clothing, resulting in a lack of coherence and spatial relationships in 2D VTON results generated from different views, thus causing appearance inconsistency and geometric distortion. 3. **Personalization and high - quality editing**: - Existing 2D VTON diffusion models may produce blurring and geometric distortion problems when dealing with data outside the training data distribution, especially when it is difficult to maintain consistency with other body parts when modifying clothing. To solve these problems, the authors propose a new 3D VTON method - **GS - VTON** (Gaussian Splatting - based 3D Virtual Try - On). This method achieves controllable 3D virtual try - on through the following innovations: - **Personalized diffusion model**: Utilize the low - rank adaptation (LoRA) fine - tuning technique to incorporate personalized information into the pre - trained 2D VTON model to improve the adaptability to specific input data. - **Reference - driven image editing**: Introduce a reference - driven image editing method, which can ensure consistency when simultaneously editing multi - view images. - **Person - aware 3D Gaussian Splatting editing framework**: Design a person - aware 3DGS editing process. By fusing two predicted attention features (one for editing and the other for ensuring consistency between different views), effective editing is achieved and multi - view consistency is enhanced. In addition, the authors also establish a new 3D VTON benchmark dataset **3D - VTONBench** to support more comprehensive qualitative and quantitative evaluations. Through these improvements, GS - VTON shows higher fidelity and advanced editing capabilities in experiments and becomes a new benchmark in the field of 3D virtual try - on.

GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting

GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting

DreamVTON: Customizing 3D Virtual Try-on with Personalized Diffusion Models

Toward Realistic Virtual Try-on Through Landmark Guided Shape Matching

D$^4$-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On

MV-VTON: Multi-View Virtual Try-On with Diffusion Models

ACDG-VTON: Accurate and Contained Diffusion Generation for Virtual Try-On

GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning

DH-VTON: Deep Text-Driven Virtual Try-On via Hybrid Attention Learning

VTON-MP: Multi-Pose Virtual Try-On Via Appearance Flow and Feature Filtering

PG-VTON: A Novel Image-Based Virtual Try-On Method Via Progressive Inference Paradigm

Realistic Monocular-To-3d Virtual Try-On Via Multi-Scale Characteristics Capture

GVGEN: Text-to-3D Generation with Volumetric Representation

Toward Characteristic-Preserving Image-based Virtual Try-On Network

DP-VTON: Toward Detail-Preserving Image-Based Virtual Try-on Network

SPG-VTON: Semantic Prediction Guidance for Multi-pose Virtual Try-on

Feat2GS: Probing Visual Foundation Models with Gaussian Splatting

View-Consistent 3D Editing with Gaussian Splatting

VTON-HF: High Fidelity Virtual Try-on Network Via Semantic Adaptation

Self-Adaptive Clothing Mapping Based Virtual Try-on

KF-VTON: Keypoints-Driven Flow Based Virtual Try-On Network