Abstract:Text-to-image diffusion models have been shown to suffer from sample-level memorization, possibly reproducing near-perfect replica of images that they are trained on, which may be undesirable. To remedy this issue, we develop the first differentially private (DP) retrieval-augmented generation algorithm that is capable of generating high-quality image samples while providing provable privacy guarantees. Specifically, we assume access to a text-to-image diffusion model trained on a small amount of public data, and design a DP retrieval mechanism to augment the text prompt with samples retrieved from a private retrieval dataset. Our \emph{differentially private retrieval-augmented diffusion model} (DP-RDM) requires no fine-tuning on the retrieval dataset to adapt to another domain, and can use state-of-the-art generative models to generate high-quality image samples while satisfying rigorous DP guarantees. For instance, when evaluated on MS-COCO, our DP-RDM can generate samples with a privacy budget of $\epsilon=10$, while providing a $3.5$ point improvement in FID compared to public-only retrieval for up to $10,000$ queries.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to protect privacy while generating high - quality images. Specifically, although existing text - to - image diffusion models (such as the research by Ho et al. in 2020 and Song et al. in 2020) can generate highly realistic image samples, these models are prone to sample - level memorization, that is, they may generate images that are almost identical to some samples in their training data. This phenomenon may be undesirable in some cases, especially when dealing with datasets containing sensitive information. To alleviate this problem, the paper proposes a new method - Differential Privacy Retrieval - Enhanced Diffusion Model (DP - RDM). This method can provide strict privacy guarantees while generating high - quality image samples. The specific implementation methods are as follows: 1. **Differential Privacy (DP)**: By introducing the differential privacy mechanism, it is ensured that the generated images do not rely heavily on any single training sample, thereby preventing the model from replicating training images. Differential privacy is a powerful privacy - protection technology that can protect the privacy of individual data during data release or analysis. 2. **Retrieval - Augmented Generation (RAG)**: Using the retrieval - augmented generation technology, the model not only uses its learned parameters but also utilizes an arbitrary set of images in a retrieval dataset. In this way, the model can adapt to different domains by changing the retrieval dataset during generation without the need for fine - tuning. The modular nature of RAG makes it very suitable for privacy - sensitive applications because sensitive data can be stored in the retrieval dataset to control information leakage in a more fine - grained manner. 3. **DP - RDM Framework**: The paper proposes a Differential Privacy Retrieval - Enhanced Diffusion Model (DP - RDM), which can meet strict differential privacy guarantees while generating high - quality images based on text prompts. Specifically, DP - RDM adds calibration noise to the retrieval mechanism and modifies the existing retrieval - enhanced diffusion model architecture to adapt to this mechanism. Experimental results show that DP - RDM can generate high - quality image samples at a moderate privacy cost. For example, on the MS - COCO dataset, when the privacy budget is $\epsilon = 10$, the FID score of DP - RDM is 3.5 points higher than that of the method using only the public retrieval dataset. In summary, the main contribution of this paper lies in showing how to effectively solve the privacy protection problem in text - to - image diffusion models when generating high - quality images through differential privacy and retrieval - enhanced generation technologies. This provides new possibilities for the widespread adoption of such models in privacy - sensitive applications.

DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning

Differentially Private Latent Diffusion Models

Differentially Private Fine-Tuning of Diffusion Models

PKDGAN: Private Knowledge Distillation with Generative Adversarial Networks

Differentially Private Adaptation of Diffusion Models via Noisy Aggregated Embeddings

PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining

Efficient Differentially Private Fine-Tuning of Diffusion Models

DPMLBench: Holistic Evaluation of Differentially Private Machine Learning

Privacy-Preserving Retrieval Augmented Generation with Differential Privacy

Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation

DDAP: Dual-Domain Anti-Personalization against Text-to-Image Diffusion Models

Differentially Private Synthetic Data via Foundation Model APIs 1: Images

Synthetic Query Generation for Privacy-Preserving Deep Retrieval Systems using Differentially Private Language Models

Differentially Private Releasing Via Deep Generative Model (technical Report)

CPR: Retrieval Augmented Generation for Copyright Protection

Private Synthetic Text Generation with Diffusion Models

Gradient-Guided Conditional Diffusion Models for Private Image Reconstruction: Analyzing Adversarial Impacts of Differential Privacy and Denoising

An Efficient DP-SGD Mechanism for Large Scale NLP Models

Multi-Resolution Diffusion for Privacy-Sensitive Recommender Systems

Unlocking Accuracy and Fairness in Differentially Private Image Classification

Visual Privacy Auditing with Diffusion Models