Abstract:The remarkable development of text-to-image generation models has raised notable security concerns, such as the infringement of portrait rights and the generation of inappropriate content. Concept erasure has been proposed to remove the model's knowledge about protected and inappropriate concepts. Although many methods have tried to balance the efficacy (erasing target concepts) and specificity (retaining irrelevant concepts), they can still generate abundant erasure concepts under the steering of semantically related inputs. In this work, we propose RealEra to address this "concept residue" issue. Specifically, we first introduce the mechanism of neighbor-concept mining, digging out the associated concepts by adding random perturbation into the embedding of erasure concept, thus expanding the erasing range and eliminating the generations even through associated concept inputs. Furthermore, to mitigate the negative impact on the generation of irrelevant concepts caused by the expansion of erasure scope, RealEra preserves the specificity through the beyond-concept regularization. This makes irrelevant concepts maintain their corresponding spatial position, thereby preserving their normal generation performance. We also employ the closed-form solution to optimize weights of U-Net for the cross-attention alignment, as well as the prediction noise alignment with the LoRA module. Extensive experiments on multiple benchmarks demonstrate that RealEra outperforms previous concept erasing methods in terms of superior erasing efficacy, specificity, and generality. More details are available on our project page <a class="link-external link-https" href="https://realerasing.github.io/RealEra/" rel="external noopener nofollow">this https URL</a> .

LEACE: Perfect linear concept erasure in closed form

Linear Adversarial Concept Erasure

Erasing Conceptual Knowledge from Language Models

Understanding Neural Networks through Representation Erasure.

MACE: Mass Concept Erasure in Diffusion Models

RealEra: Semantic-level Concept Erasure via Neighbor-Concept Mining

TaCo: Targeted Concept Erasure Prevents Non-Linear Classifiers From Detecting Protected Attributes

Kernelized Concept Erasure

STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models

PaCE: Parsimonious Concept Engineering for Large Language Models

Conceptor-Aided Debiasing of Large Language Models

Separable Multi-Concept Erasure from Diffusion Models

Robust Concept Erasure via Kernelized Rate-Distortion Maximization

Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency

Circumventing Concept Erasure Methods For Text-to-Image Generative Models

Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models

Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters

Memories of Forgotten Concepts

Cabbage Sweeter than Cake? Analysing the Potential of Large Language Models for Learning Conceptual Spaces