Abstract:Text-to-texture generation has recently attracted increasing attention, but existing methods often suffer from the problems of view inconsistencies, apparent seams, and misalignment between textures and the underlying mesh. In this paper, we propose a robust text-to-texture method for generating consistent and seamless textures that are well aligned with the mesh. Our method leverages state-of-the-art 2D diffusion models, including SDXL and multiple ControlNets, to capture structural features and intricate details in the generated textures. The method also employs a symmetrical view synthesis strategy combined with regional prompts for enhancing view consistency. Additionally, it introduces novel texture blending and soft-inpainting techniques, which significantly reduce the seam regions. Extensive experiments demonstrate that our method outperforms existing state-of-the-art methods.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are the challenges existing in view consistency, alignment between texture and underlying mesh, and seamless synthesis in existing texture generation methods. Specifically: 1. **View Inconsistency**: Existing methods based on 2D diffusion models usually generate images of a single view at a time, resulting in texture inconsistency between different viewing angles (as shown in Figure 2 - (a)). This is known as the "Janus problem", that is, the front and back views do not match. 2. **Texture - Mesh Misalignment**: Due to the lack of 3D perception ability in 2D diffusion models, the generated textures often cannot be well - aligned to the mesh of 3D objects (as shown in Figure 2 - (b)). 3. **Obvious Seams and Artifacts**: In each iteration, the generated image is projected back onto the 3D mesh and integrated into the evolving texture through UV mapping, and this step usually generates unexpected artifacts and seams (as shown in Figure 2 - (c)). To solve these problems, this paper proposes a new method named RoCoTex, aiming to generate high - quality textures that are view - consistent, well - aligned, and seamless. RoCoTex achieves this goal through the following technical means: - **Symmetric View Synthesis Strategy**: Generate two symmetric views simultaneously to enhance view consistency and combine regional prompts to alleviate the Janus problem. - **SDXL and Multiple ControlNets**: Utilize Stable Diffusion XL (SDXL) and multiple ControlNets (depth, normal, and Canny edge) to capture structural features and details and improve the alignment between texture and geometry. - **Confidence - Based Texture Fusion**: Define the confidence of each pixel and fuse local textures into the global texture according to the confidence to reduce seams. - **Soft - Inpainting Based on Differential Diffusion**: Repair the mask by Gaussian blurring and combine the differential diffusion technique to make the repair of untextured areas more natural and reduce the seams between textured and untextured areas. These techniques work together to make RoCoTex perform excellently in generating high - quality, consistent, and seamless textures. Experimental results show that RoCoTex outperforms existing methods in multiple evaluation metrics.

RoCoTex: A Robust Method for Consistent Texture Synthesis with Diffusion Models

Text2Tex: Text-driven Texture Synthesis via Diffusion Models

GenesisTex2: Stable, Consistent and High-Quality Text-to-Texture Generation

Chasing Consistency in Text-to-3D Generation from a Single Image.

Text-Guided Texturing by Synchronized Multi-View Diffusion

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

TexPainter: Generative Mesh Texturing with Multi-view Consistency

Consistent Mesh Diffusion

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models

Text-guided High-definition Consistency Texture Model

TexRO: Generating Delicate Textures of 3D Models by Recursive Optimization

Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation

An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes

VCD-Texture: Variance Alignment based 3D-2D Co-Denoising for Text-Guided Texturing

TEXTure: Text-Guided Texturing of 3D Shapes

Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control

MatAtlas: Text-driven Consistent Geometry Texturing and Material Assignment

FlashTex: Fast Relightable Mesh Texturing with LightControlNet

GenesisTex: Adapting Image Denoising Diffusion to Texture Space