Abstract:Generative models have enabled intuitive image creation and manipulation using natural language. In particular, diffusion models have recently shown remarkable results for natural image editing. In this work, we propose to apply diffusion techniques to edit textures, a specific class of images that are an essential part of 3D content creation pipelines. We analyze existing editing methods and show that they are not directly applicable to textures, since their common underlying approach, manipulating attention maps, is unsuitable for the texture domain. To address this, we propose a novel approach that instead manipulates CLIP image embeddings to condition the diffusion generation. We define editing directions using simple text prompts (e.g., "aged wood" to "new wood") and map these to CLIP image embedding space using a texture prior, with a sampling-based approach that gives us identity-preserving directions in CLIP space. To further improve identity preservation, we project these directions to a CLIP subspace that minimizes identity variations resulting from entangled texture attributes. Our editing pipeline facilitates the creation of arbitrary sliders using natural language prompts only, with no ground-truth annotated data necessary.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to achieve precise control based on natural - language prompts in texture editing while maintaining the identity of the texture unchanged. Specifically, existing image - editing methods have limitations when dealing with textures. Especially when using the attention mechanism for editing, these methods cannot well capture the structural features of the texture, resulting in the edited texture losing its original identity. In addition, many existing methods require a large amount of labeled data or fine - tuning of the model, which is impractical in practical applications. To this end, the paper proposes a new method named TexSliders, which solves the above problems by defining the editing direction in the CLIP embedding space. This method not only allows users to define the editing direction through simple text prompts but also can maintain the identity of the texture during the editing process without any labeled data or modification of the model. The main contributions of the paper include: 1. **Defining the editing direction in the CLIP space from text prompts**: The editing direction is defined by calculating the difference between the center points of the CLIP image embeddings corresponding to two text prompts. 2. **Dimension selection to improve identity preservation**: By selecting the dimensions that contribute to the editing attributes, the identity changes caused by irrelevant dimensions are reduced. 3. **Analyzing the generalization ability, compositionality of the editing direction and its application in generated images and real - life photos**: The effectiveness and robustness of the method are verified. Through these contributions, TexSliders provides a simple and effective method for natural - language - based texture editing while maintaining high - image quality and seamless splicing characteristics.

TexSliders: Diffusion-Based Texture Editing in CLIP Space

TEXTure: Text-Guided Texturing of 3D Shapes

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

PRedItOR: Text Guided Image Editing with Diffusion Prior

Editable Image Elements for Controllable Synthesis

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

TextureDiffusion: Target Prompt Disentangled Editing for Various Texture Transfer

Text-Guided Texturing by Synchronized Multi-View Diffusion

SpaText: Spatio-Textual Representation for Controllable Image Generation

TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models

Consistent Mesh Diffusion

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance

Diffusion Handles: Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D

DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation

TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models

Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models

Text2Tex: Text-driven Texture Synthesis via Diffusion Models

GeoDiffuser: Geometry-Based Image Editing with Diffusion Models

DragTex: Generative Point-Based Texture Editing on 3D Mesh

Multi-Region Text-Driven Manipulation of Diffusion Imagery