Abstract:Model-serving systems have become increasingly popular, especially in real-time web applications. In such systems, users send queries to the server and specify the desired performance metrics (e.g., desired accuracy, latency). The server maintains a set of models (model zoo) in the back-end and serves the queries based on the specified metrics. This paper examines the security, specifically robustness against model extraction attacks, of such systems. Existing black-box attacks assume a single model can be repeatedly selected for serving inference requests. Modern inference serving systems break this assumption. Thus, they cannot be directly applied to extract a victim model, as models are hidden behind a layer of abstraction exposed by the serving system. An attacker can no longer identify which model she is interacting with. To this end, we first propose a query-efficient fingerprinting algorithm to enable the attacker to trigger any desired model consistently. We show that by using our fingerprinting algorithm, model extraction can have fidelity and accuracy scores within $1\%$ of the scores obtained when attacking a single, explicitly specified model, as well as up to $14.6\%$ gain in accuracy and up to $7.7\%$ gain in fidelity compared to the naive attack. Second, we counter the proposed attack with a noise-based defense mechanism that thwarts fingerprinting by adding noise to the specified performance metrics. The proposed defense strategy reduces the attack's accuracy and fidelity by up to $9.8\%$ and $4.8\%$, respectively (on medium-sized model extraction). Third, we show that the proposed defense induces a fundamental trade-off between the level of protection and system goodput, achieving configurable and significant victim model extraction protection while maintaining acceptable goodput ($>80\%$). We implement the proposed defense in a real system with plans to open source.

Stealing Machine Learning Models via Prediction APIs

Extracting Robust Models with Uncertain Examples

D-DAE: Defense-Penetrating Model Extraction Attacks.

Beyond Labeling Oracles: What does it mean to steal ML models?

Model Extraction Attacks Revisited

Model Extraction Attacks and Defenses on Cloud-Based Machine Learning Models

Exploring Connections Between Active Learning and Model Extraction

When Machine Learning Models Leak: An Exploration of Synthetic Training Data

Extracting Cloud-based Model with Prior Knowledge

Thief, Beware of What Get You There: Towards Understanding Model Extraction Attack

Preventing Neural Network Model Exfiltration in Machine Learning Hardware Accelerators

MisGUIDE : Defense Against Data-Free Deep Learning Model Extraction

Model for Peanuts: Hijacking ML Models without Training Access is Possible

MEGEX: Data-Free Model Extraction Attack against Gradient-Based Explainable AI

InverseNet: Augmenting Model Extraction Attacks with Training Data Inversion

Model-Reuse Attacks on Deep Learning Systems

ES Attack: Model Stealing against Deep Neural Networks without Data Hurdles

Precise Extraction of Deep Learning Models via Side-Channel Attacks on Edge/Endpoint Devices

Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems

SwiftTheft: A Time-Efficient Model Extraction Attack Framework Against Cloud-Based Deep Neural Networks

Teach LLMs to Phish: Stealing Private Information from Language Models