Data-adaptive Differentially Private Prompt Synthesis for In-Context Learning

Fengyu Gao,Ruida Zhou,Tianhao Wang,Cong Shen,Jing Yang

2024-10-16

Abstract:Large Language Models (LLMs) rely on the contextual information embedded in examples/demonstrations to perform in-context learning (ICL). To mitigate the risk of LLMs potentially leaking private information contained in examples in the prompt, we introduce a novel data-adaptive differentially private algorithm called AdaDPSyn to generate synthetic examples from the private dataset and then use these synthetic examples to perform ICL. The objective of AdaDPSyn is to adaptively adjust the noise level in the data synthesis mechanism according to the inherent statistical properties of the data, thereby preserving high ICL accuracy while maintaining formal differential privacy guarantees. A key innovation in AdaDPSyn is the Precision-Focused Iterative Radius Reduction technique, which dynamically refines the aggregation radius - the scope of data grouping for noise addition - based on patterns observed in data clustering, thereby minimizing the amount of additive noise. We conduct extensive experiments on standard benchmarks and compare AdaDPSyn with DP few-shot generation algorithm (Tang et al., 2023). The experiments demonstrate that AdaDPSyn not only outperforms DP few-shot generation, but also maintains high accuracy levels close to those of non-private baselines, providing an effective solution for ICL with privacy protection.

Cryptography and Security,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to protect the privacy information in the prompt data and effectively perform In - Context Learning (ICL) in ICL. Specifically, large - language models (LLMs) rely on the context information embedded in examples or demonstrations for context learning. However, using real examples as prompts may lead to the leakage of sensitive information, especially in sensitive fields such as medicine and finance. To solve this problem, the author proposes a new data - adaptive differential privacy algorithm, AdaDPSyn, to generate synthetic examples from private datasets and use these synthetic examples for context learning. The main goal of AdaDPSyn is to adaptively adjust the noise level in the data synthesis mechanism according to the inherent statistical characteristics of the data, thereby maintaining high ICL accuracy while maintaining formal differential privacy guarantees. The following is a summary of the key points of this paper: 1. **Problem Definition**: How to efficiently perform context - learning tasks while protecting the privacy of prompt data. 2. **Method Innovation**: The AdaDPSyn algorithm is introduced, which uses a data - adaptive noise - adding mechanism to generate synthetic examples. 3. **Core Technology**: The "Precision - Focused Iterative Radius Reduction" technology is proposed, which minimizes the amount of noise added by dynamically adjusting the aggregation radius. 4. **Experimental Verification**: Through standard benchmark tests, it is proved that AdaDPSyn not only outperforms existing differential - privacy few - shot generation algorithms, but also can approach the performance of non - privacy baselines under strict privacy settings. Through these innovations, AdaDPSyn provides an effective solution to ensure high accuracy and practicality of context learning while protecting privacy.

Data-adaptive Differentially Private Prompt Synthesis for In-Context Learning

Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation

PKDGAN: Private Knowledge Distillation with Generative Adversarial Networks

PrivSyn: Differentially Private Data Synthesis

Differentially Private Synthetic Data: Applied Evaluations and Enhancements

Privacy-Preserving In-Context Learning for Large Language Models

Locally Differentially Private In-Context Learning

Differentially Private Knowledge Distillation via Synthetic Text Generation

Adaptively Private Next-Token Prediction of Large Language Models

ABSyn: An Accurate Differentially Private Data Synthesis Scheme With Adaptive Selection and Batch Processes

Synthetic Query Generation for Privacy-Preserving Deep Retrieval Systems using Differentially Private Language Models

Prompt Public Large Language Models to Synthesize Data for Private On-device Applications

DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer

Private prediction for large-scale synthetic text generation

SoK: Privacy-Preserving Data Synthesis

Harnessing large-language models to generate private synthetic text

Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Differentially Private Language Models for Secure Data Sharing

Differentially Private Next-Token Prediction of Large Language Models

Differentially Private Learning Needs Better Model Initialization and Self-Distillation

Differentially Private Synthetic Data via Foundation Model APIs 1: Images