Abstract:Given a 3D mesh, we aim to synthesize 3D textures that correspond to arbitrary textual descriptions. Current methods for generating and assembling textures from sampled views often result in prominent seams or excessive smoothing. To tackle these issues, we present TexGen, a novel multi-view sampling and resampling framework for texture generation leveraging a pre-trained text-to-image diffusion model. For view consistent sampling, first of all we maintain a texture map in RGB space that is parameterized by the denoising step and updated after each sampling step of the diffusion model to progressively reduce the view discrepancy. An attention-guided multi-view sampling strategy is exploited to broadcast the appearance information across views. To preserve texture details, we develop a noise resampling technique that aids in the estimation of noise, generating inputs for subsequent denoising steps, as directed by the text prompt and current texture map. Through an extensive amount of qualitative and quantitative evaluations, we demonstrate that our proposed method produces significantly better texture quality for diverse 3D objects with a high degree of view consistency and rich appearance details, outperforming current state-of-the-art methods. Furthermore, our proposed texture generation technique can also be applied to texture editing while preserving the original identity. More experimental results are available at <a class="link-external link-https" href="https://dong-huo.github.io/TexGen/" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The main goal of this paper is to address two key issues in current 3D texture generation: view inconsistency and over-smoothing. 1. **View Consistency Issue**: Existing methods based on text-to-image (T2I) diffusion models often produce noticeable seams or inconsistencies between different viewpoints when generating 3D object textures. This is mainly due to the lack of sufficient information transfer mechanisms during multi-view sampling, resulting in texture details that do not match well when the same object is viewed from different angles. 2. **Over-Smoothing Issue**: Some methods, while capable of generating high-quality textures, lose texture details during the generation process, leading to overly smooth and less realistic final results. To address the above issues, the paper proposes a new framework called TexGen. This framework combines a pre-trained T2I diffusion model with a novel multi-view sampling and resampling strategy to directly generate view-consistent and detail-rich 3D textures. Specifically: - **Multi-View Sampling**: By maintaining an iteratively updated UV texture map, texture details from multiple viewpoints are predicted and assembled at each denoising step, ensuring that the generated texture remains consistent across different views. - **Attention-Guided Multi-View Sampling**: An attention mechanism is introduced to guide the texture generation process across viewpoints, ensuring the consistency of texture details. - **Text and Texture-Guided Resampling**: A new noise estimation technique is developed, utilizing the current texture map and text prompts to optimize noise estimation, thereby avoiding the over-smoothing issue. Through these methods, the paper aims to achieve automatic generation of high-quality, view-consistent, and detail-rich 3D textures, demonstrating significant advantages over existing state-of-the-art methods.

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

TEXGen: a Generative Diffusion Model for Mesh Textures

TexPainter: Generative Mesh Texturing with Multi-view Consistency

MVTexGen: Synthesising 3D Textures Using Multi-View Diffusion

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

Text2Tex: Text-driven Texture Synthesis via Diffusion Models

Chasing Consistency in Text-to-3D Generation from a Single Image.

GenesisTex2: Stable, Consistent and High-Quality Text-to-Texture Generation

GenesisTex: Adapting Image Denoising Diffusion to Texture Space

Text-Guided Texturing by Synchronized Multi-View Diffusion

StyleTex: Style Image-Guided Texture Generation for 3D Models

TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models

TEXTure: Text-Guided Texturing of 3D Shapes

UV-free Texture Generation with Denoising and Geodesic Heat Diffusions

Texture Generation on 3D Meshes with Point-UV Diffusion

Learning Pseudo 3D Guidance for View-consistent Texturing with 2D Diffusion

Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models

Generation of View-Dependent Textures for an Inaccurate Model

VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation

Meta 3D TextureGen: Fast and Consistent Texture Generation for 3D Objects