PDLM: Privacy-Preserving Deep Learning Model on Cloud with Multiple Keys

Xindi Ma,Jianfeng Ma,Hui Li,Qi Jiang,Sheng Gao
DOI: https://doi.org/10.1109/tsc.2018.2868750
IF: 11.019
2021-07-01
IEEE Transactions on Services Computing
Abstract:Deep learning has aroused a lot of attention and has been used successfully in many domains, such as accurate image recognition and medical diagnosis. Generally, the training of models requires large, representative datasets, which may be collected from a large number of users and contain sensitive information (e.g., users’ photos and medical information). The collected data would be stored and computed by service providers (SPs) or delegated to an untrusted cloud. The users can neither control how it will be used, nor realize what will be learned from it, which make the privacy issues prominent and severe. To solve the privacy issues, one of the most popular approaches is to encrypt users’ data with their public keys. However, this technique inevitably leads to another challenge that how to train the model based on multi-key encrypted data. In this paper, we propose a novel privacy-preserving deep learning model, namely PDLM, to apply deep learning over the encrypted data under multiple keys. In PDLM, lots of users contribute their encrypted data to SP to learn a specific model. We adopt an effective privacy-preserving calculation toolkit to achieve the training process based on stochastic gradient descent (SGD) in a privacy-preserving manner. We also prove that our PDLM can achieve users’ privacy preservation and analyze the efficiency of PDLM in theory. Finally, we conduct an experiment to evaluate PDLM over two real-world datasets and empirical results demonstrate that our PDLM can effectively and efficiently train the model in a privacy-preserving way.
computer science, information systems, software engineering
What problem does this paper attempt to address?
The main problem this paper attempts to address is: in a cloud computing environment, how to train deep learning models using encrypted data from multiple users while protecting user privacy. Specifically, the successful application of deep learning relies on large and representative datasets, which often contain sensitive information (such as users' photos, medical information, etc.). When this data is stored and processed by service providers or untrusted cloud platforms, users cannot control how the data is used or know what can be learned from the data, leading to serious privacy issues. To tackle this challenge, the paper proposes a new privacy-preserving deep learning model—PDLM (Privacy-Preserving Deep Learning Model on Cloud with Multiple Keys), which allows service providers to train deep learning models using multi-key encrypted data without revealing user privacy. The main contributions of PDLM include: 1. Designing a new mechanism that allows service providers to offload most of the computational tasks to the cloud, thereby training deep learning models without disclosing any private information. 2. By using an efficient privacy-preserving computation toolkit, service providers can send multi-key encrypted training data to an untrusted cloud and perform forward and backward propagation based on the stochastic gradient descent (SGD) algorithm on the cloud, minimizing the storage and computational overhead for the service provider while ensuring that the training data is not leaked to the service provider or the untrusted cloud. 3. The cloud server can convert the multi-key encrypted training data into ciphertext under the same key, thereby running traditional arithmetic operations in a privacy-preserving manner to learn model parameters. 4. Through theoretical analysis and experimental validation, it is demonstrated that PDLM can efficiently and effectively train deep learning models.