Abstract:Owing to the unrestricted nature of the content in the training data, large text-to-image diffusion models, such as Stable Diffusion (SD), are capable of generating images with potentially copyrighted or dangerous content based on corresponding textual concepts information. This includes specific intellectual property (IP), human faces, and various artistic styles. However, Negative Prompt, a widely used method for content removal, frequently fails to conceal this content due to inherent limitations in its inference logic. In this work, we propose a novel strategy named \textbf{Degeneration-Tuning (DT)} to shield contents of unwanted concepts from SD weights. By utilizing Scrambled Grid to reconstruct the correlation between undesired concepts and their corresponding image domain, we guide SD to generate meaningless content when such textual concepts are provided as input. As this adaptation occurs at the level of the model's weights, the SD, after DT, can be grafted onto other conditional diffusion frameworks like ControlNet to shield unwanted concepts. In addition to qualitatively showcasing the effectiveness of our DT method in protecting various types of concepts, a quantitative comparison of the SD before and after DT indicates that the DT method does not significantly impact the generative quality of other contents. The FID and IS scores of the model on COCO-30K exhibit only minor changes after DT, shifting from 12.61 and 39.20 to 13.04 and 38.25, respectively, which clearly outperforms the previous methods.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to block specific unwanted concepts in large - scale text - to - image diffusion models (such as Stable Diffusion), while keeping the quality of the model's generation of other content not significantly affected. Specifically, the paper proposes a new strategy - Degeneration - Tuning (DT). By using the Scrambled Grid to reconstruct the association between unwanted concepts and their corresponding image domains, it guides Stable Diffusion to generate meaningless content when receiving these text concepts. This method can not only effectively protect various types of concepts but also achieve the blocking of specific concepts at the model parameter level and maintain its effectiveness even if the model parameters are leaked. ### Main contributions 1. **Analyzed the generation mechanism of diffusion models**: It was found that the main factor affecting the generation of semantic content in the model is the distance between the Gaussian noise of the initial sampling and the final diffusion distribution within the training data domain. 2. **Proposed the Degeneration - Tuning (DT) method**: By disrupting the low - frequency visual content of specific conditional concepts through the Scrambled Grid operation, constructing a degraded data set, and readjusting the Stable Diffusion model on these data sets to reconstruct the model's prediction of visual content related to unwanted concepts. 3. **Verified the feasibility and challenges of continuous DT**: Explored the continuous learning ability of the DT method in future online applications. ### Method overview - **Preliminary knowledge**: Diffusion models (DM) learn the data distribution \(p(x)\) by gradually denoising random variables sampled from the Gaussian distribution. During the diffusion process, by continuously adding Gaussian noise \(\epsilon\) to the image \(x_0\) sampled from \(p(x)\), the diffusion model learns the relationship between the data distribution \(p(x)\) and the Gaussian distribution \(N\sim(0, I)\). - **Motivation**: By observing the diffusion and generation processes, it was found that small changes in the initial noise significantly affect the semantic information of the generated image. In particular, the low - frequency signal disappears last during the diffusion process and appears first during the generation process. This indicates that the distribution distance of the initial noise determines the content of the generated image. - **Degeneration - Tuning (DT)**: 1. **Generation**: Sample Gaussian noise \(\epsilon\sim N(0, I)\) and use specific text conditions \(c_{sp}\) to generate images containing the required text concepts. 2. **Scramble Grid**: Divide the image generated by specific condition information \(c_{sp}\) into grids and randomly re - arrange them to create a scrambled image \(x_{sg}=O(SD_{pr}(\epsilon, c_{sp}))\). 3. **Tuning**: Use the scrambled image \(x_{sg}\) and the anchor image \(x_{ac}\) to construct tuning data, and use these data and their corresponding text conditions \(c\) to fine - tune the parameters \(D_T^\theta\) of the pre - trained Stable Diffusion model \(SD_{pr}\). ### Experimental results - **Effect of re - contextualization**: The DT method is effective not only when a single text concept is input but also in various contexts containing these concepts. - **Effect of protecting artistic styles**: The DT method performs well in blocking artistic styles, such as "Monet" and "Starry Night". - **Effectiveness for other concepts**: After blocking specific concepts (such as "spider - man"), the DT method has no significant deviation or degradation when generating other non - specific concept content. - **Grafting ability**: The model tuned by DT can be grafted into other condition - controlled diffusion models (such as ControlNet) and still be able to block specific concepts, even when additional condition information (such as pose and edge information) is input. ### Conclusion The Degeneration - Tuning (DT) method proposed in the paper blocks specific concepts while maintaining the quality of the model's generation of other content and has broad

Degeneration-Tuning: Using Scrambled Grid shield Unwanted Concepts from Stable Diffusion

SteerDiff: Steering towards Safe Text-to-Image Diffusion Models

Backdooring Textual Inversion for Concept Censorship

Ablating Concepts in Text-to-Image Diffusion Models

All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models

Continuous Concepts Removal in Text-to-image Diffusion Models

DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models

Erasing Concepts from Diffusion Models

Dark Miner: Defend against undesired generation for text-to-image diffusion models

ShieldDiff: Suppressing Sexual Content Generation from Diffusion Models through Reinforcement Learning

Editing Massive Concepts in Text-to-Image Diffusion Models

Toward Robust Imperceptible Perturbation against Unauthorized Text-to-image Diffusion-based Synthesis

Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models

Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models

Latent Space Disentanglement in Diffusion Transformers Enables Zero-shot Fine-grained Semantic Editing

Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models

Safety Alignment Backfires: Preventing the Re-emergence of Suppressed Concepts in Fine-tuned Text-to-Image Diffusion Models

Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance

EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models

Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques