Enabling Differentially Private in Big Data Machine Learning

Dong Li,Xiaojiang Zuo,Rui Han
DOI: https://doi.org/10.1109/icsidp47821.2019.9173114
2019-01-01
Abstract:Using the machine learning technology to explore the potential value of Big Data brings us into a smarter world, and the way data is mined through data sharing patterns also threatens the privacy of personal data. Differential privacy is a prevalent mechanism to effectively protect the personal data privacy due to the strict and the provable privacy definition, although there are several achievements have reached by combining the differential privacy and traditional machine learning algorithms in a stand-alone mode, little to talk about the distributed environment. To fill this gap, this paper proposes a method to embed the differential privacy mechanism into distributed platform, respectively implements the DPLloyd, GUPT k-means and GUPT logistic regression on the platform of Spark. The evaluation demonstrates that the approach barely interferes the effect of distributed machine learning algorithms and thus achieves the goal of differential privacy.
What problem does this paper attempt to address?