Tempo: Confidentiality Preservation in Cloud-Based Neural Network Training

Rongwu Xu,Zhixuan Fang
2024-01-21
Abstract:Cloud deep learning platforms provide cost-effective deep neural network (DNN) training for customers who lack computation resources. However, cloud systems are often untrustworthy and vulnerable to attackers, leading to growing concerns about model privacy. Recently, researchers have sought to protect data privacy in deep learning by leveraging CPU trusted execution environments (TEEs), which minimize the use of cryptography, but existing works failed to simultaneously utilize the computational resources of GPUs to assist in training and prevent model leakage. This paper presents Tempo, the first cloud-based deep learning system that cooperates with TEE and distributed GPUs for efficient DNN training with model confidentiality preserved. To tackle the challenge of preserving privacy while offloading linear algebraic operations from TEE to GPUs for efficient batch computation, we introduce a customized permutation-based obfuscation algorithm to blind both inputs and model parameters. An optimization mechanism that reduces encryption operations is proposed for faster weight updates during backpropagation to speed up training. We implement Tempo and evaluate it with both training and inference for two prevalent DNNs. Empirical results indicate that Tempo outperforms baselines and offers sufficient privacy protection.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the issue of data privacy protection during deep neural network (DNN) training on cloud platforms. Specifically, the paper proposes a system called **Tempo**, which can efficiently perform DNN training using cloud computing resources while ensuring the privacy of both the model and the input data. #### Main Issues 1. **Input Privacy Protection**: Users want to train models on cloud platforms without revealing their training data. 2. **Model Privacy Protection**: The trained models have commercial value, and users do not want these models to be accessed by third parties. #### Current Research and Contributions Existing research mainly focuses on using Trusted Execution Environments (TEE) to protect data privacy, but these methods often fail to fully utilize the computational resources of GPUs, leading to inefficient training. To address this, the paper proposes a new framework that combines TEE and distributed GPUs, ensuring both privacy and improved training speed. #### Key Technologies - **MM-obfuscation Algorithm**: A permutation-based obfuscation algorithm that can protect both model parameters and input data, reducing the overhead caused by encryption operations. - **Distributed Training Strategy**: Combines data parallelism, tensor parallelism, and pipeline parallelism to optimize encryption and communication costs during distributed training. Through these technological innovations, Tempo achieves efficient privacy-preserving DNN training and demonstrates superior performance in experiments.