GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting

Haodong Chen,Yongle Huang,Haojian Huang,Xiangsheng Ge,Dian Shao
2024-05-23
Abstract:The increasing prominence of e-commerce has underscored the importance of Virtual Try-On (VTON). However, previous studies predominantly focus on the 2D realm and rely heavily on extensive data for training. Research on 3D VTON primarily centers on garment-body shape compatibility, a topic extensively covered in 2D VTON. Thanks to advances in 3D scene editing, a 2D diffusion model has now been adapted for 3D editing via multi-viewpoint editing. In this work, we propose GaussianVTON, an innovative 3D VTON pipeline integrating Gaussian Splatting (GS) editing with 2D VTON. To facilitate a seamless transition from 2D to 3D VTON, we propose, for the first time, the use of only images as editing prompts for 3D editing. To further address issues, e.g., face blurring, garment inaccuracy, and degraded viewpoint quality during editing, we devise a three-stage refinement strategy to gradually mitigate potential issues. Furthermore, we introduce a new editing strategy termed Edit Recall Reconstruction (ERR) to tackle the limitations of previous editing strategies in leading to complex geometric changes. Our comprehensive experiments demonstrate the superiority of GaussianVTON, offering a novel perspective on 3D VTON while also establishing a novel starting point for image-prompting 3D scene editing.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address several key issues in Virtual Try-On (VTON), particularly the technical challenges encountered in the transition from 2D VTON to 3D VTON. Specifically: 1. **Realism and Personalized Editing in 3D VTON**: - Currently, most research focuses on 2D VTON and relies on a large amount of training data to improve accuracy. However, in practical applications, these methods perform poorly when handling user-customized data, especially in terms of clothing accuracy and facial blurriness. - Existing 3D VTON mainly focuses on the compatibility between clothing and body shape, lacking simulation of real human forms. 2. **Multi-View Consistency Issue**: - During 3D scene editing, due to the difficulty of achieving multi-view consistency with diffusion models, the editing results are inconsistent across different views. - This inconsistency is particularly evident in tasks requiring precise editing of specific areas (such as VTON). 3. **3D Editing with Image Prompts**: - Current 3D editing methods mainly rely on text prompts for editing, which often fails to meet user expectations in practical applications, especially when precise personalized editing is required. - To better achieve personalized editing, the paper proposes using image prompts for 3D editing to improve the realism and consistency of the editing results. By proposing a new 3D VTON framework called GaussianVTON, which combines Gaussian Splatting editing techniques and the 2D VTON model (LaDI-VTON), the paper addresses the aforementioned issues and introduces a three-stage refinement strategy and a new editing strategy (Edit Recall Reconstruction, ERR) to ensure high-quality output in multi-view editing processes. Additionally, experimental results validate the effectiveness and superiority of GaussianVTON, providing a new research starting point for 3D VTON and 3D editing.