Data-adaptive Differentially Private Prompt Synthesis for In-Context Learning

Fengyu Gao,Ruida Zhou,Tianhao Wang,Cong Shen,Jing Yang
2024-10-16
Abstract:Large Language Models (LLMs) rely on the contextual information embedded in examples/demonstrations to perform in-context learning (ICL). To mitigate the risk of LLMs potentially leaking private information contained in examples in the prompt, we introduce a novel data-adaptive differentially private algorithm called AdaDPSyn to generate synthetic examples from the private dataset and then use these synthetic examples to perform ICL. The objective of AdaDPSyn is to adaptively adjust the noise level in the data synthesis mechanism according to the inherent statistical properties of the data, thereby preserving high ICL accuracy while maintaining formal differential privacy guarantees. A key innovation in AdaDPSyn is the Precision-Focused Iterative Radius Reduction technique, which dynamically refines the aggregation radius - the scope of data grouping for noise addition - based on patterns observed in data clustering, thereby minimizing the amount of additive noise. We conduct extensive experiments on standard benchmarks and compare AdaDPSyn with DP few-shot generation algorithm (Tang et al., 2023). The experiments demonstrate that AdaDPSyn not only outperforms DP few-shot generation, but also maintains high accuracy levels close to those of non-private baselines, providing an effective solution for ICL with privacy protection.
Cryptography and Security,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to protect the privacy information in the prompt data and effectively perform In - Context Learning (ICL) in ICL. Specifically, large - language models (LLMs) rely on the context information embedded in examples or demonstrations for context learning. However, using real examples as prompts may lead to the leakage of sensitive information, especially in sensitive fields such as medicine and finance. To solve this problem, the author proposes a new data - adaptive differential privacy algorithm, AdaDPSyn, to generate synthetic examples from private datasets and use these synthetic examples for context learning. The main goal of AdaDPSyn is to adaptively adjust the noise level in the data synthesis mechanism according to the inherent statistical characteristics of the data, thereby maintaining high ICL accuracy while maintaining formal differential privacy guarantees. The following is a summary of the key points of this paper: 1. **Problem Definition**: How to efficiently perform context - learning tasks while protecting the privacy of prompt data. 2. **Method Innovation**: The AdaDPSyn algorithm is introduced, which uses a data - adaptive noise - adding mechanism to generate synthetic examples. 3. **Core Technology**: The "Precision - Focused Iterative Radius Reduction" technology is proposed, which minimizes the amount of noise added by dynamically adjusting the aggregation radius. 4. **Experimental Verification**: Through standard benchmark tests, it is proved that AdaDPSyn not only outperforms existing differential - privacy few - shot generation algorithms, but also can approach the performance of non - privacy baselines under strict privacy settings. Through these innovations, AdaDPSyn provides an effective solution to ensure high accuracy and practicality of context learning while protecting privacy.