DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning

Jonathan Lebensold,Maziar Sanjabi,Pietro Astolfi,Adriana Romero-Soriano,Kamalika Chaudhuri,Mike Rabbat,Chuan Guo
2024-05-13
Abstract:Text-to-image diffusion models have been shown to suffer from sample-level memorization, possibly reproducing near-perfect replica of images that they are trained on, which may be undesirable. To remedy this issue, we develop the first differentially private (DP) retrieval-augmented generation algorithm that is capable of generating high-quality image samples while providing provable privacy guarantees. Specifically, we assume access to a text-to-image diffusion model trained on a small amount of public data, and design a DP retrieval mechanism to augment the text prompt with samples retrieved from a private retrieval dataset. Our \emph{differentially private retrieval-augmented diffusion model} (DP-RDM) requires no fine-tuning on the retrieval dataset to adapt to another domain, and can use state-of-the-art generative models to generate high-quality image samples while satisfying rigorous DP guarantees. For instance, when evaluated on MS-COCO, our DP-RDM can generate samples with a privacy budget of $\epsilon=10$, while providing a $3.5$ point improvement in FID compared to public-only retrieval for up to $10,000$ queries.
Machine Learning,Cryptography and Security,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to protect privacy while generating high - quality images. Specifically, although existing text - to - image diffusion models (such as the research by Ho et al. in 2020 and Song et al. in 2020) can generate highly realistic image samples, these models are prone to sample - level memorization, that is, they may generate images that are almost identical to some samples in their training data. This phenomenon may be undesirable in some cases, especially when dealing with datasets containing sensitive information. To alleviate this problem, the paper proposes a new method - Differential Privacy Retrieval - Enhanced Diffusion Model (DP - RDM). This method can provide strict privacy guarantees while generating high - quality image samples. The specific implementation methods are as follows: 1. **Differential Privacy (DP)**: By introducing the differential privacy mechanism, it is ensured that the generated images do not rely heavily on any single training sample, thereby preventing the model from replicating training images. Differential privacy is a powerful privacy - protection technology that can protect the privacy of individual data during data release or analysis. 2. **Retrieval - Augmented Generation (RAG)**: Using the retrieval - augmented generation technology, the model not only uses its learned parameters but also utilizes an arbitrary set of images in a retrieval dataset. In this way, the model can adapt to different domains by changing the retrieval dataset during generation without the need for fine - tuning. The modular nature of RAG makes it very suitable for privacy - sensitive applications because sensitive data can be stored in the retrieval dataset to control information leakage in a more fine - grained manner. 3. **DP - RDM Framework**: The paper proposes a Differential Privacy Retrieval - Enhanced Diffusion Model (DP - RDM), which can meet strict differential privacy guarantees while generating high - quality images based on text prompts. Specifically, DP - RDM adds calibration noise to the retrieval mechanism and modifies the existing retrieval - enhanced diffusion model architecture to adapt to this mechanism. Experimental results show that DP - RDM can generate high - quality image samples at a moderate privacy cost. For example, on the MS - COCO dataset, when the privacy budget is \(\epsilon = 10\), the FID score of DP - RDM is 3.5 points higher than that of the method using only the public retrieval dataset. In summary, the main contribution of this paper lies in showing how to effectively solve the privacy protection problem in text - to - image diffusion models when generating high - quality images through differential privacy and retrieval - enhanced generation technologies. This provides new possibilities for the widespread adoption of such models in privacy - sensitive applications.