Abstract:In the era of big-data, individuals and institutions store their sensitive data on clouds, and these data are often analyzed and computed by MapReduce frameworks (e.g., Spark). However, releasing the computation result on these data may leak privacy. Differential Privacy (DP) is a powerful method to preserve the privacy of an individual data record from a computation result. Given an input dataset and a query, DP typically perturbs an output value with noise proportional to sensitivity, the greatest change on an output value when a record is added to or removed from the input dataset. Unfortunately, directly computing the sensitivity value for a query and an input dataset is computationally infeasible, because it requires adding or removing every record from the dataset and repeatedly running the same query on the dataset: a dataset of one million input records requires running the same query for more than one million times. This paper presents UPA, the first automated, accurate, and efficient sensitivity inferring approach for big-data mining applications. Our key observation is that MapReduce operators often have commutative and associative properties in order to enable parallelism and fault tolerance among computers. Therefore, UPA can greatly reduce the repeated computations at runtime while computing a precise sensitivity value automatically for general big-data queries. We compared UPA with FLEX, the most relevant work that does static analysis on queries to infer sensitivity values. Based on an extensive evaluation on nine diverse Spark queries, UPA supports all the nine evaluated queries, while FLEX supports only five of the nine queries. For the five queries which both UPA and FLEX can support, UPA enforces DP with five orders of magnitude more accurate sensitivity values than FLEX. UPA has reasonable performance overhead compared to native Spark. UPA's source code is available on https://github.com/hku-systems/UPA.

Top-k Frequent Itemsets Publication of Uncertain Data Based on Differential Privacy

An Effective Scheme for Top-K Frequent Itemset Mining under Differential Privacy Conditions

Mining Top-k Minimal Redundancy Frequent Patterns over Uncertain Databases.

Differentially Private Frequent Itemset Mining Against Incremental Updates

Private Frequent Itemset Mining in the Local Setting

Privacy Preserving Frequent Itemset Mining: Maximizing Data Utility Based on Database Reconstruction.

Hadamard Encoding Based Frequent Itemset Mining under Local Differential Privacy

Privacy Preserving Frequent Itemsets Mining Based on Database Reconstruction

UPA: an Automated, Accurate and Efficient Differentially Private Big-Data Mining System

Mining Noise-Tolerant Frequent Closed Itemsets in Very Large Database.

Frequent Itemsets Mining with Differential Privacy over Large-Scale Data

Discovering Top-K Patterns with Differential Privacy-An Accurate Approach

Secure Two-Party Frequent Itemset Mining with Guaranteeing Differential Privacy

Stable Periodic Frequent Itemset Mining on Uncertain Datasets

Differentially Private Two-Party Top-$k$ Frequent Item Mining

LDP-FPMiner: FP-Tree Based Frequent Itemset Mining with Local Differential Privacy

Frequent Symptom Sets Identification from Uncertain Medical Data in Differentially Private Way.

Differentially Private Frequent Itemset Mining From Smart Devices In Local Setting

Dataflow Frequent Item Set Publishing Based on Differential Privacy

Privacy preserving rare itemset mining

Discovering Probabilistic Weighted Frequent Itemsets over Uncertain Data