TexSliders: Diffusion-Based Texture Editing in CLIP Space

Julia Guerrero-Viu,Milos Hasan,Arthur Roullier,Midhun Harikumar,Yiwei Hu,Paul Guerrero,Diego Gutierrez,Belen Masia,Valentin Deschaintre
DOI: https://doi.org/10.1145/3641519.3657444
2024-05-02
Abstract:Generative models have enabled intuitive image creation and manipulation using natural language. In particular, diffusion models have recently shown remarkable results for natural image editing. In this work, we propose to apply diffusion techniques to edit textures, a specific class of images that are an essential part of 3D content creation pipelines. We analyze existing editing methods and show that they are not directly applicable to textures, since their common underlying approach, manipulating attention maps, is unsuitable for the texture domain. To address this, we propose a novel approach that instead manipulates CLIP image embeddings to condition the diffusion generation. We define editing directions using simple text prompts (e.g., "aged wood" to "new wood") and map these to CLIP image embedding space using a texture prior, with a sampling-based approach that gives us identity-preserving directions in CLIP space. To further improve identity preservation, we project these directions to a CLIP subspace that minimizes identity variations resulting from entangled texture attributes. Our editing pipeline facilitates the creation of arbitrary sliders using natural language prompts only, with no ground-truth annotated data necessary.
Graphics,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve precise control based on natural - language prompts in texture editing while maintaining the identity of the texture unchanged. Specifically, existing image - editing methods have limitations when dealing with textures. Especially when using the attention mechanism for editing, these methods cannot well capture the structural features of the texture, resulting in the edited texture losing its original identity. In addition, many existing methods require a large amount of labeled data or fine - tuning of the model, which is impractical in practical applications. To this end, the paper proposes a new method named TexSliders, which solves the above problems by defining the editing direction in the CLIP embedding space. This method not only allows users to define the editing direction through simple text prompts but also can maintain the identity of the texture during the editing process without any labeled data or modification of the model. The main contributions of the paper include: 1. **Defining the editing direction in the CLIP space from text prompts**: The editing direction is defined by calculating the difference between the center points of the CLIP image embeddings corresponding to two text prompts. 2. **Dimension selection to improve identity preservation**: By selecting the dimensions that contribute to the editing attributes, the identity changes caused by irrelevant dimensions are reduced. 3. **Analyzing the generalization ability, compositionality of the editing direction and its application in generated images and real - life photos**: The effectiveness and robustness of the method are verified. Through these contributions, TexSliders provides a simple and effective method for natural - language - based texture editing while maintaining high - image quality and seamless splicing characteristics.