MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

Wei Cheng,Juncheng Mu,Xianfang Zeng,Xin Chen,Anqi Pang,Chi Zhang,Zhibin Wang,Bin Fu,Gang Yu,Ziwei Liu,Liang Pan
2024-11-05
Abstract:Texturing is a crucial step in the 3D asset production workflow, which enhances the visual appeal and diversity of 3D assets. Despite recent advancements in Text-to-Texture (T2T) generation, existing methods often yield subpar results, primarily due to local discontinuities, inconsistencies across multiple views, and their heavy dependence on UV unwrapping outcomes. To tackle these challenges, we propose a novel generation-refinement 3D texturing framework called MVPaint, which can generate high-resolution, seamless textures while emphasizing multi-view consistency. MVPaint mainly consists of three key modules. 1) Synchronized Multi-view Generation (SMG). Given a 3D mesh model, MVPaint first simultaneously generates multi-view images by employing an SMG model, which leads to coarse texturing results with unpainted parts due to missing observations. 2) Spatial-aware 3D Inpainting (S3I). To ensure complete 3D texturing, we introduce the S3I method, specifically designed to effectively texture previously unobserved areas. 3) UV Refinement (UVR). Furthermore, MVPaint employs a UVR module to improve the texture quality in the UV space, which first performs a UV-space Super-Resolution, followed by a Spatial-aware Seam-Smoothing algorithm for revising spatial texturing discontinuities caused by UV unwrapping. Moreover, we establish two T2T evaluation benchmarks: the Objaverse T2T benchmark and the GSO T2T benchmark, based on selected high-quality 3D meshes from the Objaverse dataset and the entire GSO dataset, respectively. Extensive experimental results demonstrate that MVPaint surpasses existing state-of-the-art methods. Notably, MVPaint could generate high-fidelity textures with minimal Janus issues and highly enhanced cross-view consistency.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address several key issues in 3D texture generation: 1. **Multi-View Consistency**: - Existing methods often exhibit inconsistencies between different views when generating 3D textures, leading to local style discontinuities and multiple seams. - The proposed method aims to ensure that the generated textures remain consistent across multiple views. 2. **Diverse Texture Details**: - Generated textures are often too smooth and lack detail, especially in high-resolution outputs. - The proposed method can generate high-resolution textures with rich details. 3. **UV Unwrapping Robustness**: - Existing methods heavily rely on the results of UV unwrapping, leading to texture discontinuities when UV maps are randomly packed. - The proposed method reduces dependence on UV unwrapping quality, achieving more robust automated generation. ### Main Contributions of the Paper 1. **Proposed a Robust 3D Texture Generation Framework (MVPaint)**: - MVPaint can generate diverse, high-quality, seamless 3D textures while ensuring multi-view consistency. 2. **Proposed Various 3D Texture Generation Models, Operations, and Strategies**: - Including Synchronized Multi-view Generation (SMG), Spatial-aware 3D Inpainting (S3I), and UV Refinement (UVR) modules. - These contributions will significantly advance future research in 3D texture generation. 3. **Established Two Benchmark Datasets**: - Objaverse T2T benchmark and GSO T2T benchmark for evaluating text-to-texture (T2T) generation performance. - Experimental results show that MVPaint outperforms existing state-of-the-art methods on these benchmarks. ### Method Overview MVPaint consists of three main stages: 1. **Synchronized Multi-view Generation (SMG)**: - Uses multi-view diffusion models and cross-attention mechanisms to generate initial low-resolution multi-view images. - By synchronizing multi-view generation, it avoids the Janus problem and generates highly consistent multi-view images. 2. **Spatial-aware 3D Inpainting (S3I)**: - Performs inpainting in 3D space to fill in unobserved areas, ensuring complete 3D textures. 3. **UV Refinement (UVR)**: - Conducts super-resolution processing in UV space to add fine details, then uses a spatial-aware seam smoothing algorithm to correct seams caused by UV unwrapping. ### Experimental Results - **Quantitative Results**: - On the Objaverse T2T benchmark and GSO T2T benchmark, MVPaint achieved the best performance in metrics such as FID and KID. - Especially on the GSO T2T benchmark, MVPaint demonstrated good generalization ability, outperforming other methods. - **User Study**: - User study results show that MVPaint received the highest scores in overall quality, seam visibility, and consistency. In summary, this paper effectively addresses issues of multi-view consistency, rich details, and UV unwrapping robustness in 3D texture generation through the proposed MVPaint framework, significantly improving the quality and effectiveness of 3D texture generation.