Abstract:Style-guided texture generation aims to generate a texture that is harmonious with both the style of the reference image and the geometry of the input mesh, given a reference style image and a 3D mesh with its text description. Although diffusion-based 3D texture generation methods, such as distillation sampling, have numerous promising applications in stylized games and films, it requires addressing two challenges: 1) decouple style and content completely from the reference image for 3D models, and 2) align the generated texture with the color tone, style of the reference image, and the given text prompt. To this end, we introduce StyleTex, an innovative diffusion-model-based framework for creating stylized textures for 3D models. Our key insight is to decouple style information from the reference image while disregarding content in diffusion-based distillation sampling. Specifically, given a reference image, we first decompose its style feature from the image CLIP embedding by subtracting the embedding's orthogonal projection in the direction of the content feature, which is represented by a text CLIP embedding. Our novel approach to disentangling the reference image's style and content information allows us to generate distinct style and content features. We then inject the style feature into the cross-attention mechanism to incorporate it into the generation process, while utilizing the content feature as a negative prompt to further dissociate content information. Finally, we incorporate these strategies into StyleTex to obtain stylized textures. The resulting textures generated by StyleTex retain the style of the reference image, while also aligning with the text prompts and intrinsic details of the given 3D mesh. Quantitative and qualitative experiments show that our method outperforms existing baseline methods by a significant margin.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to generate textures that are consistent with the style of the reference image and coordinated with the geometric structure of the 3D model. Specifically, the authors focus on the situation where, given a reference - style image and an untextured 3D model, the generated texture should not only maintain the style of the reference image, but also be consistent with the text prompt and the inherent details of the 3D model. ### Main Challenges 1. **Decoupling of Style and Content**: It is necessary to completely decouple the style and content information from the reference image to ensure that the generated texture only contains the desired style without introducing unnecessary content. 2. **Preservation of Color Tones**: Ensure that the generated texture is consistent with the reference image in terms of color and style, and avoid problems such as oversaturation and over - smoothing. ### Solutions To solve the above problems, the authors propose an innovative diffusion model framework named StyleTex. The main contributions of StyleTex include: 1. **Style Decoupling and Injection Strategy**: By decomposing the style features and content features in the CLIP embedding space, effectively guide the stylization process, while solving the problems of content leakage and style deviation. 2. **Geometry - Aware ControlNet**: Ensure that the generated texture maintains geometric consistency from different perspectives. 3. **Interval Score Matching (ISM)**: Used to solve the problem of overly smooth and blurry generation results, improve the generation quality and accelerate convergence. ### Method Overview The workflow of StyleTex is shown in Figure 2 and mainly includes the following steps: - **Input**: Untextured 3D model \(M\), reference - style image \(I_{ref}\) and text prompt \(y\). - **Style Feature Extraction**: Use the ODCR method to extract content - independent style features \(f_{ref}^s\) from the reference image. - **Optimization of Neural Color Field**: Optimize the neural color field through the Unet network so that the generated texture is both in line with the style and consistent with the geometric structure. - **Texture Sampling and Application**: Sample texture maps from the optimized neural color field and directly apply them in game or movie production. ### Experiments and Results The effectiveness of each component was verified through ablation experiments, including strategies such as using negative content prompts and style guidance. The experimental results show that StyleTex is significantly superior to existing methods in generating high - quality, style - consistent textures. In conclusion, StyleTex provides an efficient and robust method to generate textures that are consistent with the style of the reference image and coordinated with the geometric structure of the 3D model, thus promoting the development of the fields of computer graphics and vision.

StyleTex: Style Image-Guided Texture Generation for 3D Models

TeSTNeRF: Text-Driven 3D Style Transfer Via Cross-Modal Learning.

3Dstyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models

TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

Style3D: Attention-guided Multi-view Style Transfer for 3D Object Generation

GenesisTex2: Stable, Consistent and High-Quality Text-to-Texture Generation

Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image Generation

StyleCity: Large-Scale 3D Urban Scenes Stylization

CSGO: Content-Style Composition in Text-to-Image Generation

X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization Via Dynamic Textual Guidance

Text2Tex: Text-driven Texture Synthesis via Diffusion Models

3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization

Text-guided High-definition Consistency Texture Model

Learning Pseudo 3D Guidance for View-consistent Texturing with 2D Diffusion

ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors

TEXTure: Text-Guided Texturing of 3D Shapes

Text-Guided Texturing by Synchronized Multi-View Diffusion

InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

Training-Free Diffusion Models for Content-Style Synthesis

FlexiTex: Enhancing Texture Generation with Visual Guidance