Abstract:Recent years have witnessed success in AIGC (AI Generated Content). People can make use of a pre-trained diffusion model to generate images of high quality or freely modify existing pictures with only prompts in nature language. More excitingly, the emerging personalization techniques make it feasible to create specific-desired images with only a few images as references. However, this induces severe threats if such advanced techniques are misused by malicious users, such as spreading fake news or defaming individual reputations. Thus, it is necessary to regulate personalization models (i.e., concept censorship) for their development and advancement. In this paper, we focus on the personalization technique dubbed Textual Inversion (TI), which is becoming prevailing for its lightweight nature and excellent performance. TI crafts the word embedding that contains detailed information about a specific object. Users can easily download the word embedding from public websites like Civitai and add it to their own stable diffusion model without fine-tuning for personalization. To achieve the concept censorship of a TI model, we propose leveraging the backdoor technique for good by injecting backdoors into the Textual Inversion embeddings. Briefly, we select some sensitive words as triggers during the training of TI, which will be censored for normal use. In the subsequent generation stage, if the triggers are combined with personalized embeddings as final prompts, the model will output a pre-defined target image rather than images including the desired malicious concept. To demonstrate the effectiveness of our approach, we conduct extensive experiments on Stable Diffusion, a prevailing open-sourced text-to-image model. Our code, data, and results are available at <a class="link-external link-https" href="https://concept-censorship.github.io" rel="external noopener nofollow">this https URL</a>.

Prior Preserved Text-to-Image Personalization Without Image Regularization

Key-Locked Rank One Editing for Text-to-Image Personalization

Diversified text-to-image generation via deep mutual information estimation

Controllable Textual Inversion for Personalized Text-to-Image Generation

Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach

Backdooring Textual Inversion for Concept Censorship

Attention Calibration for Disentangled Text-to-Image Personalization

Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting

DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

Personalization as a Shortcut for Few-Shot Backdoor Attack against Text-to-Image Diffusion Models

AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation

MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models

A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization

MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation

Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models

PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control

Is This Loss Informative? Faster Text-to-Image Customization by Tracking Objective Dynamics

Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models

Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning