Abstract:Deep learning unlocks applications with societal impacts, e.g., detecting child exploitation imagery and genomic analysis of rare diseases. Deployment, however, needs compliance with stringent privacy regulations. Training algorithms that preserve the privacy of training data are in pressing need. Purely cryptographic approaches can protect privacy, but they are still costly, even when they rely on two or more non-colluding servers. Seemingly-"trivial" operations in plaintext quickly become prohibitively inefficient when a series of them are "crypto-processed," e.g., (dynamic) quantization for ensuring the intermediate values would not overflow. Slalom, recently proposed by Tramer and Boneh, is the first solution that leverages both GPU (for efficient batch computation) and a trusted execution environment (TEE) (for minimizing the use of cryptography). Roughly, it works by a lot of pre-computation over known and fixed weights, and hence it only supports private inference. Five related problems for private training are left unaddressed. Goten, our privacy-preserving training and prediction framework, tackles all five problems simultaneously via our careful design over the "mismatched" cryptographic and GPU data types (due to the tension between precision and efficiency) and our round-optimal GPU-outsourcing protocol (hence minimizing the communication cost between servers). It 1) stochastically trains a low-bitwidth yet accurate model, 2) supports dynamic quantization (a challenge left by Slalom), 3) minimizes the memory-swapping overhead of the memory-limited TEE and its communication with GPU, 4) crypto-protects the (dynamic) model weight from untrusted GPU, and 5) outperforms a pure-TEE system, even without pre-computation (needed by Slalom). As a baseline, we build CaffeScone that secures Caffe using TEE but not GPU; Goten shows a 6.84x speed-up of the whole VGG-11. Goten also outperforms Falcon proposed by Wagh et al., the latest secure multi-server cryptographic solution, by 132.64x using VGG-11. Lastly, we demonstrate Goten's efficacy in training models for breast cancer diagnosis over sensitive images.

Confidential Computing on nVIDIA Hopper GPUs: A Performance Benchmark Study

Confidential Computing on nVIDIA H100 GPU: A Performance Benchmark Study

Enabling Rack-scale Confidential Computing using Heterogeneous Trusted Execution Environment

Fastrack: Fast IO for Secure ML using GPU TEEs

TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing

PipeLLM: Fast and Confidential Large Language Model Services with Speculative Pipelined Encryption

Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads

Empowering Data Centers for Next Generation Trusted Computing

Goten: GPU-Outsourcing Trusted Execution of Neural Network Training

Enabling Privacy-Preserving, Compute- and Data-Intensive Computing using Heterogeneous Trusted Execution Environment

Confidential Computing on Heterogeneous CPU-GPU Systems: Survey and Future Directions

Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware

Performance Analysis of Scientific Computing Workloads on Trusted Execution Environments

Efficient Privacy-Preserving Machine Learning with Lightweight Trusted Hardware

3LegRace: Privacy-Preserving DNN Training over TEEs and GPUs

Honeycomb: Secure and Efficient GPU Executions via Static Validation.

An Experimental Evaluation of TEE technology Evolution: Benchmarking Transparent Approaches based on SGX, SEV, and TDX

Tempo: Confidentiality Preservation in Cloud-Based Neural Network Training

Tally: Non-Intrusive Performance Isolation for Concurrent Deep Learning Workloads

Toward Scalable Fully Homomorphic Encryption Through Light Trusted Computing Assistance

GOAT: GPU Outsourcing of Deep Learning Training With Asynchronous Probabilistic Integrity Verification Inside Trusted Execution Environment