Abstract:While high-quality texture maps are essential for realistic 3D asset rendering, few studies have explored learning directly in the texture space, especially on large-scale datasets. In this work, we depart from the conventional approach of relying on pre-trained 2D diffusion models for test-time optimization of 3D textures. Instead, we focus on the fundamental problem of learning in the UV texture space itself. For the first time, we train a large diffusion model capable of directly generating high-resolution texture maps in a feed-forward manner. To facilitate efficient learning in high-resolution UV spaces, we propose a scalable network architecture that interleaves convolutions on UV maps with attention layers on point clouds. Leveraging this architectural design, we train a 700 million parameter diffusion model that can generate UV texture maps guided by text prompts and single-view images. Once trained, our model naturally supports various extended applications, including text-guided texture inpainting, sparse-view texture completion, and text-driven texture synthesis. Project page is at <a class="link-external link-http" href="http://cvmi-lab.github.io/TEXGen/" rel="external noopener nofollow">this http URL</a>.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve several key problems in 3D mesh texture generation: 1. **Limitations of existing methods**: - **Dependence on pre - trained 2D diffusion models**: Most existing methods rely on pre - trained 2D diffusion models for test - time optimization, which leads to problems such as time - consuming per - object optimization, complex parameter adjustment, and dependence on 2D priors. - **Lack of 3D consistency**: Methods based on 2D diffusion models often lack global 3D consistency, especially prone to inconsistent problems in multi - view texture synthesis. - **Category - specificity**: Many methods are limited to specific categories of objects and are difficult to generalize to a wider range of object categories. 2. **High - resolution texture generation**: - Existing methods face challenges in generating high - resolution texture maps (such as 1024×1024), especially when directly generating high - quality textures from text or single - view images. 3. **End - to - end generation**: - There is a lack of an end - to - end model that can directly generate high - quality texture maps without additional stages or test - time optimization. To solve these problems, the authors propose TEXGen, a large - scale generative diffusion model for 3D mesh texture generation. Specifically, TEXGen solves the above problems in the following ways: - **Direct learning in UV texture space**: TEXGen directly learns in the UV texture space, avoiding dependence on rendering loss, thereby improving the generation quality. - **Hybrid 2D - 3D network architecture**: By combining 2D convolution and 3D point - cloud attention mechanisms, TEXGen can ensure global 3D consistency while maintaining local high - resolution details. - **Large - scale datasets and model parameters**: By using a large - scale dataset and a diffusion model with 700 million parameters, TEXGen can generate high - quality texture maps and support multiple applications, such as text - guided texture synthesis, texture inpainting, and texture completion under sparse views. In summary, the main goal of this paper is to develop an efficient, high - quality, and general - purpose 3D mesh texture generation model to overcome the limitations of existing methods.

TEXGen: a Generative Diffusion Model for Mesh Textures

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

MVTexGen: Synthesising 3D Textures Using Multi-View Diffusion

GenesisTex2: Stable, Consistent and High-Quality Text-to-Texture Generation

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

TexPainter: Generative Mesh Texturing with Multi-view Consistency

Texture Generation on 3D Meshes with Point-UV Diffusion

Text2Tex: Text-driven Texture Synthesis via Diffusion Models

GenesisTex: Adapting Image Denoising Diffusion to Texture Space

TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models

Text-guided High-definition Consistency Texture Model

UV-free Texture Generation with Denoising and Geodesic Heat Diffusions

Infinite Texture: Text-guided High Resolution Diffusion Texture Synthesis

Consistent Mesh Diffusion

Text-Guided Texturing by Synchronized Multi-View Diffusion

UVMap-ID: A Controllable and Personalized UV Map Generative Model

Learning Pseudo 3D Guidance for View-consistent Texturing with 2D Diffusion

TM-NET: Deep Generative Networks for Textured Meshes

DragTex: Generative Point-Based Texture Editing on 3D Mesh

TUVF: Learning Generalizable Texture UV Radiance Fields

TEXTure: Text-Guided Texturing of 3D Shapes