StyleTex: Style Image-Guided Texture Generation for 3D Models

Zhiyu Xie,Yuqing Zhang,Xiangjun Tang,Yiqian Wu,Dehan Chen,Gongsheng Li,Xaogang Jin
2024-11-01
Abstract:Style-guided texture generation aims to generate a texture that is harmonious with both the style of the reference image and the geometry of the input mesh, given a reference style image and a 3D mesh with its text description. Although diffusion-based 3D texture generation methods, such as distillation sampling, have numerous promising applications in stylized games and films, it requires addressing two challenges: 1) decouple style and content completely from the reference image for 3D models, and 2) align the generated texture with the color tone, style of the reference image, and the given text prompt. To this end, we introduce StyleTex, an innovative diffusion-model-based framework for creating stylized textures for 3D models. Our key insight is to decouple style information from the reference image while disregarding content in diffusion-based distillation sampling. Specifically, given a reference image, we first decompose its style feature from the image CLIP embedding by subtracting the embedding's orthogonal projection in the direction of the content feature, which is represented by a text CLIP embedding. Our novel approach to disentangling the reference image's style and content information allows us to generate distinct style and content features. We then inject the style feature into the cross-attention mechanism to incorporate it into the generation process, while utilizing the content feature as a negative prompt to further dissociate content information. Finally, we incorporate these strategies into StyleTex to obtain stylized textures. The resulting textures generated by StyleTex retain the style of the reference image, while also aligning with the text prompts and intrinsic details of the given 3D mesh. Quantitative and qualitative experiments show that our method outperforms existing baseline methods by a significant margin.
Computer Vision and Pattern Recognition,Graphics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to generate textures that are consistent with the style of the reference image and coordinated with the geometric structure of the 3D model. Specifically, the authors focus on the situation where, given a reference - style image and an untextured 3D model, the generated texture should not only maintain the style of the reference image, but also be consistent with the text prompt and the inherent details of the 3D model. ### Main Challenges 1. **Decoupling of Style and Content**: It is necessary to completely decouple the style and content information from the reference image to ensure that the generated texture only contains the desired style without introducing unnecessary content. 2. **Preservation of Color Tones**: Ensure that the generated texture is consistent with the reference image in terms of color and style, and avoid problems such as oversaturation and over - smoothing. ### Solutions To solve the above problems, the authors propose an innovative diffusion model framework named StyleTex. The main contributions of StyleTex include: 1. **Style Decoupling and Injection Strategy**: By decomposing the style features and content features in the CLIP embedding space, effectively guide the stylization process, while solving the problems of content leakage and style deviation. 2. **Geometry - Aware ControlNet**: Ensure that the generated texture maintains geometric consistency from different perspectives. 3. **Interval Score Matching (ISM)**: Used to solve the problem of overly smooth and blurry generation results, improve the generation quality and accelerate convergence. ### Method Overview The workflow of StyleTex is shown in Figure 2 and mainly includes the following steps: - **Input**: Untextured 3D model \(M\), reference - style image \(I_{ref}\) and text prompt \(y\). - **Style Feature Extraction**: Use the ODCR method to extract content - independent style features \(f_{ref}^s\) from the reference image. - **Optimization of Neural Color Field**: Optimize the neural color field through the Unet network so that the generated texture is both in line with the style and consistent with the geometric structure. - **Texture Sampling and Application**: Sample texture maps from the optimized neural color field and directly apply them in game or movie production. ### Experiments and Results The effectiveness of each component was verified through ablation experiments, including strategies such as using negative content prompts and style guidance. The experimental results show that StyleTex is significantly superior to existing methods in generating high - quality, style - consistent textures. In conclusion, StyleTex provides an efficient and robust method to generate textures that are consistent with the style of the reference image and coordinated with the geometric structure of the 3D model, thus promoting the development of the fields of computer graphics and vision.