Abstract:Differential privacy (DP), as a rigorous mathematical definition quantifying privacy leakage, has become a well-accepted standard for privacy protection. Combined with powerful machine learning (ML) techniques, differentially private machine learning (DPML) is increasingly important. As the most classic DPML algorithm, DP-SGD incurs a significant loss of utility, which hinders DPML's deployment in practice. Many studies have recently proposed improved algorithms based on DP-SGD to mitigate utility loss. However, these studies are isolated and cannot comprehensively measure the performance of improvements proposed in algorithms. More importantly, there is a lack of comprehensive research to compare improvements in these DPML algorithms across utility, defensive capabilities, and generalizability. We fill this gap by performing a holistic measurement of improved DPML algorithms on utility and defense capability against membership inference attacks (MIAs) on image classification tasks. We first present a taxonomy of where improvements are located in the ML life cycle. Based on our taxonomy, we jointly perform an extensive measurement study of the improved DPML algorithms, over twelve algorithms, four model architectures, four datasets, two attacks, and various privacy budget configurations. We also cover state-of-the-art label differential privacy (Label DP) algorithms in the evaluation. According to our empirical results, DP can effectively defend against MIAs, and sensitivity-bounding techniques such as per-sample gradient clipping play an important role in defense. We also explore some improvements that can maintain model utility and defend against MIAs more effectively. Experiments show that Label DP algorithms achieve less utility loss but are fragile to MIAs. ML practitioners may benefit from these evaluations to select appropriate algorithms. To support our evaluation, we implement a modular re-usable software, DPMLBench,(1) which enables sensitive data owners to deploy DPML algorithms and serves as a benchmark tool for researchers and practitioners.

A Customized Text Privatiztion Mechanism with Differential Privacy

A Customized Text Sanitization Mechanism with Differential Privacy

BRR: Preserving Privacy of Text Data Efficiently on Device

Guiding Text-to-Text Privatization by Syntax

DPMLBench: Holistic Evaluation of Differentially Private Machine Learning

Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Thinking Outside of the Differential Privacy Box: A Case Study in Text Privatization with Language Model Prompting

Differentially Private Language Models for Secure Data Sharing

ADePT: Auto-encoder based Differentially Private Text Transformation

InferDPT: Privacy-Preserving Inference for Black-box Large Language Model

1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy

Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe

How reparametrization trick broke differentially-private text representation learning

A Different Level Text Protection Mechanism With Differential Privacy

Customizable Reliable Privacy-Preserving Data Sharing in Cyber-Physical Social Networks

Not one but many Tradeoffs: Privacy Vs. Utility in Differentially Private Machine Learning

Disentangling the Linguistic Competence of Privacy-Preserving BERT

TextObfuscator: Making Pre-trained Language Model a Privacy Protector via Obfuscating Word Representations

DP-MLM: Differentially Private Text Rewriting Using Masked Language Models

Robust Utility-Preserving Text Anonymization Based on Large Language Models

Balancing Innovation and Privacy: Data Security Strategies in Natural Language Processing Applications