Abstract:Differential privacy (DP), as a rigorous mathematical definition quantifying privacy leakage, has become a well-accepted standard for privacy protection. Combined with powerful machine learning (ML) techniques, differentially private machine learning (DPML) is increasingly important. As the most classic DPML algorithm, DP-SGD incurs a significant loss of utility, which hinders DPML's deployment in practice. Many studies have recently proposed improved algorithms based on DP-SGD to mitigate utility loss. However, these studies are isolated and cannot comprehensively measure the performance of improvements proposed in algorithms. More importantly, there is a lack of comprehensive research to compare improvements in these DPML algorithms across utility, defensive capabilities, and generalizability. We fill this gap by performing a holistic measurement of improved DPML algorithms on utility and defense capability against membership inference attacks (MIAs) on image classification tasks. We first present a taxonomy of where improvements are located in the ML life cycle. Based on our taxonomy, we jointly perform an extensive measurement study of the improved DPML algorithms, over twelve algorithms, four model architectures, four datasets, two attacks, and various privacy budget configurations. We also cover state-of-the-art label differential privacy (Label DP) algorithms in the evaluation. According to our empirical results, DP can effectively defend against MIAs, and sensitivity-bounding techniques such as per-sample gradient clipping play an important role in defense. We also explore some improvements that can maintain model utility and defend against MIAs more effectively. Experiments show that Label DP algorithms achieve less utility loss but are fragile to MIAs. ML practitioners may benefit from these evaluations to select appropriate algorithms. To support our evaluation, we implement a modular re-usable software, DPMLBench,(1) which enables sensitive data owners to deploy DPML algorithms and serves as a benchmark tool for researchers and practitioners.

Differentially Private Inductive Miner

UPA: an Automated, Accurate and Efficient Differentially Private Big-Data Mining System

PMDG: Privacy for Multi-Perspective Process Mining through Data Generalization

Privacy-Preserving Directly-Follows Graphs: Balancing Risk and Utility in Process Mining

Differentially private data release for data mining

Striking a new Balance in Accuracy and Simplicity with the Probabilistic Inductive Miner

DPMLBench: Holistic Evaluation of Differentially Private Machine Learning

Elephants Do Not Forget: Differential Privacy with State Continuity for Privacy Budget

PEM: A Practical Differentially Private System for Large-Scale Cross-Institutional Data Mining.

Privacy-Preserving Process Mining in Healthcare

Measure-Observe-Remeasure: An Interactive Paradigm for Differentially-Private Exploratory Analysis

Semantics-aware mechanisms for control-flow anonymization in process mining

Query Optimization for Differentially Private Data Management Systems.

Group-based privacy preservation techniques for process mining

Differentially Private Tree-Based Redescription Mining

Inference With Combining Rules From Multiple Differentially Private Synthetic Datasets

CONFINE: Preserving Data Secrecy in Decentralized Process Mining

Qualitative Instead Of Quantitative: Towards Practical Data Analysis Under Differential Privacy

Pure Differential Privacy for Functional Summaries via a Laplace-like Process

Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy

DProvDB: Differentially Private Query Processing with Multi-Analyst Provenance