Local Differential Privacy for Deep Learning

M.A.P. Chamikara,P. Bertok,I. Khalil,D. Liu,S. Camtepe,M. Atiquzzaman
DOI: https://doi.org/10.1109/JIOT.2019.2952146
2019-11-09
Abstract:The internet of things (IoT) is transforming major industries including but not limited to healthcare, agriculture, finance, energy, and transportation. IoT platforms are continually improving with innovations such as the amalgamation of software-defined networks (SDN) and network function virtualization (NFV) in the edge-cloud interplay. Deep learning (DL) is becoming popular due to its remarkable accuracy when trained with a massive amount of data, such as generated by IoT. However, DL algorithms tend to leak privacy when trained on highly sensitive crowd-sourced data such as medical data. Existing privacy-preserving DL algorithms rely on the traditional server-centric approaches requiring high processing powers. We propose a new local differentially private (LDP) algorithm named LATENT that redesigns the training process. LATENT enables a data owner to add a randomization layer before data leave the data owners' devices and reach a potentially untrusted machine learning service. This feature is achieved by splitting the architecture of a convolutional neural network (CNN) into three layers: (1) convolutional module, (2) randomization module, and (3) fully connected module. Hence, the randomization module can operate as an NFV privacy preservation service in an SDN-controlled NFV, making LATENT more practical for IoT-driven cloud-based environments compared to existing approaches. The randomization module employs a newly proposed LDP protocol named utility enhancing randomization, which allows LATENT to maintain high utility compared to existing LDP protocols. Our experimental evaluation of LATENT on convolutional deep neural networks demonstrates excellent accuracy (e.g. 91%- 96%) with high model quality even under low privacy budgets (e.g. $\varepsilon=0.5$).
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to protect user privacy during the training process of deep - learning models, especially when using highly sensitive crowdsourced data (such as medical data)**. Specifically, the existing privacy - protected deep - learning algorithms rely on traditional server - centered methods, which require high processing power and have limitations in application in distributed environments (such as SDN - controlled NFV environments). In addition, these methods may not be able to fully prevent privacy leakage, especially in cases where deep - learning models are prone to expose private information of training data. To solve these problems, the author proposes a new local differential privacy (LDP) algorithm, called **LATENT**, which redesigns the training process so that data owners can add a randomization layer before the data leaves their devices, thereby protecting privacy before the data is transmitted to potentially untrusted machine - learning services. By dividing the convolutional neural network (CNN) architecture into three modules - the convolutional module, the randomization module, and the fully - connected module, LATENT can protect privacy more effectively in edge - computing and cloud - computing environments while maintaining high model accuracy and practicality. ### Specific problem summary: 1. **Privacy leakage problem**: Deep - learning models may leak users' privacy information during the training process, especially when using sensitive data (such as medical data). 2. **Limitations of existing methods**: The existing privacy - protection methods rely on server - centered architectures, require high processing power, and have limitations in application in distributed environments. 3. **Balance between privacy and utility**: Ensure high accuracy and practicality of deep - learning models while protecting privacy. ### Solutions: - Propose a new local differential privacy (LDP) algorithm, LATENT. - By adding a randomization layer on the data owner's device, the need for a trusted third party is avoided. - Divide the CNN architecture into three modules so that the randomization module can operate as a privacy - protection service in an SDN - controlled NFV environment. - Introduce a new LDP protocol - Utility - Enhanced Randomization (UER) to improve the model utility under privacy protection. ### Formula summary: - Privacy budget \(\epsilon\) and failure probability \(\delta\) in the definition of differential privacy: \[ \text{Pr}[M(x) \in S] \leq \exp(\epsilon)\cdot\text{Pr}[M(y) \in S]+\delta \] - Randomization probability \(p\) in the random response technique: \[ \] p = \frac{e^{\epsilon}}{1 + e^{\epsilon}} \[ - Randomization probability \(p\) in LATENT: \[ p=\frac{e^{\epsilon/(rl)}}{1 + e^{\epsilon/(rl)}} \] Through these methods, LATENT can maintain high model accuracy and practicality while protecting privacy, especially in the Internet - of - Things - driven cloud environment.