Abstract:On-device ML introduces new security challenges: DNN models become white-box accessible to device users. Based on white-box information, adversaries can conduct effective model stealing (MS) and membership inference attack (MIA). Using Trusted Execution Environments (TEEs) to shield on-device DNN models aims to downgrade (easy) white-box attacks to (harder) black-box attacks. However, one major shortcoming is the sharply increased latency (up to 50X). To accelerate TEE-shield DNN computation with GPUs, researchers proposed several model partition techniques. These solutions, referred to as TEE-Shielded DNN Partition (TSDP), partition a DNN model into two parts, offloading the privacy-insensitive part to the GPU while shielding the privacy-sensitive part within the TEE. This paper benchmarks existing TSDP solutions using both MS and MIA across a variety of DNN models, datasets, and metrics. We show important findings that existing TSDP solutions are vulnerable to privacy-stealing attacks and are not as safe as commonly believed. We also unveil the inherent difficulty in deciding optimal DNN partition configurations (i.e., the highest security with minimal utility cost) for present TSDP solutions. The experiments show that such “sweet spot” configurations vary across datasets and models. Based on lessons harvested from the experiments, we present TEESlice, a novel TSDP method that defends against MS and MIA during DNN inference. TEESlice follows a partition-before-training strategy, which allows for accurate separation between privacy-related weights from public weights. TEESlice delivers the same security protection as shielding the entire DNN model inside TEE (the “upper-bound” security guarantees) with over 10X less overhead (in both experimental and real-world environments) than prior TSDP solutions and no accuracy loss.

Model Protection: Real-Time Privacy-Preserving Inference Service for Model Privacy at the Edge

CHEETAH: An Ultra-Fast, Approximation-Free, and Privacy-Preserved Neural Network Framework based on Joint Obscure Linear and Nonlinear Computations

A Distributed Privacy-Preserving Framework for Deep Learning with Edge-Cloud Computing.

Penetralium: Privacy-preserving and memory-efficient neural network inference at the edge

Privacy-Preserving Machine Learning Based Data Analytics on Edge Devices

Occlumency

EdgeSanitizer: Locally Differentially Private Deep Inference at the Edge for Mobile Data Analytics

ShadowNet: A Secure and Efficient On-device Model Inference System for Convolutional Neural Networks

Privacy preserving layer partitioning for Deep Neural Network models

Privacy‐preserving task offloading in mobile edge computing: A deep reinforcement learning approach

No Privacy Left Outside: on the (In-)Security of TEE-Shielded DNN Partition for On-Device ML

SecureML: A System for Scalable Privacy-Preserving Machine Learning

Privacy for Rescue: A New Testimony Why Privacy is Vulnerable In Deep Models

Efficient Privacy-Preserving Machine Learning with Lightweight Trusted Hardware

Privacy-preserving Security Inference Towards Cloud-Edge Collaborative Using Differential Privacy

SHAPER: A General Architecture for Privacy-Preserving Primitives in Secure Machine Learning.

Secure MLaaS with Temper: Trusted and Efficient Model Partitioning and Enclave Reuse

CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment

A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics

A Secure and Privacy-Preserving Machine Learning Model Sharing Scheme for Edge-Enabled IoT

CrossNet: A Low-Latency MLaaS Framework for Privacy-Preserving Neural Network Inference on Resource-Limited Devices