Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient

Yongliang Wu,Shiji Zhou,Mingzhuo Yang,Lianzhe Wang,Wenbo Zhu,Heng Chang,Xiao Zhou,Xu Yang

2024-05-24

Abstract:Current text-to-image diffusion models have achieved groundbreaking results in image generation tasks. However, the unavoidable inclusion of sensitive information during pre-training introduces significant risks such as copyright infringement and privacy violations in the generated images. Machine Unlearning (MU) provides a effective way to the sensitive concepts captured by the model, has been shown to be a promising approach to addressing these issues. Nonetheless, existing MU methods for concept erasure encounter two primary bottlenecks: 1) generalization issues, where concept erasure is effective only for the data within the unlearn set, and prompts outside the unlearn set often still result in the generation of sensitive concepts; and 2) utility drop, where erasing target concepts significantly degrades the model's performance. To this end, this paper first proposes a concept domain correction framework for unlearning concepts in diffusion models. By aligning the output domains of sensitive concepts and anchor concepts through adversarial training, we enhance the generalizability of the unlearning results. Secondly, we devise a concept-preserving scheme based on gradient surgery. This approach alleviates the parts of the unlearning gradient that contradict the relearning gradient, ensuring that the process of unlearning minimally disrupts the model's performance. Finally, extensive experiments validate the effectiveness of our model, demonstrating our method's capability to address the challenges of concept unlearning in diffusion models while preserving model utility.

Machine Learning,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper focuses on the problem of eliminating concepts in diffusion models, which are commonly used for text-to-image generation tasks. Current models inevitably contain sensitive information during pre-training, which may lead to copyright infringement and privacy violation issues in the generated images. Machine unlearning, as a possible solution, aims to make the models forget sensitive information in the training data. However, existing machine unlearning methods face two main bottlenecks: 1) generalization problem, where they are only effective for data within the learned set and may still generate sensitive concepts for out-of-set prompts; 2) performance degradation, where deleting target concepts significantly reduces the model's performance. To address these issues, the paper proposes two methods: domain calibration framework and concept retention gradient surgery. Domain calibration aligns the output domain of sensitive concepts and anchor concepts through adversarial training to enhance the generalization capability of the forgetting results. Concept retention gradient surgery prunes and relearns conflicting gradients to ensure that the forgetting process does not excessively impact model performance. Experimental results demonstrate that this approach can effectively forget specific concepts while minimizing the impact on relevant non-target concepts, thereby addressing the challenges of concept forgetting while maintaining the usefulness of the model. The paper also compares it with other methods, demonstrating its advantages in preventing the generation of inappropriate content and protecting copyrights.

Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient

Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models

One for All: A Universal Generator for Concept Unlearnability Via Multi-Modal Alignment

Unlearning or Concealment? A Critical Analysis and Evaluation Metrics for Unlearning in Diffusion Models

Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts

Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models

Probing Unlearned Diffusion Models: A Transferable Adversarial Attack Perspective

Removing Undesirable Concepts in Text-to-Image Diffusion Models with Learnable Prompts

Separable Multi-Concept Erasure from Diffusion Models

All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models

Unlearning Concepts from Text-to-Video Diffusion Models

Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning

ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning

Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning

Unlearnable Examples for Diffusion Models: Protect Data from Unauthorized Exploitation

Ablating Concepts in Text-to-Image Diffusion Models

Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models

Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models