Abstract:Training a Deep Learning (DL) model requires proprietary data and computing-intensive resources. To recoup their training costs, a model provider can monetize DL models through Machine Learning as a Service (MLaaS). Generally, the model is deployed at the cloud, while providing a publicly accessible Application Programming Interface (API) for paid queries to obtain benefits. However, model stealing attacks have posed security threats to this model monetizing scheme as they steal the model without paying for future extensive queries. Specifically, an adversary queries a targeted model to obtain input-output pairs and thus infer the model's internal working mechanism by reverse-engineering a substitute model, which has deprived model owner's business advantage and leaked the privacy of the model. In this work, we observe that the confidence vector or the top-1 confidence returned from the model under attack (MUA) varies in a relative large degree given different queried inputs. Therefore, rich internal information of the MUA is leaked to the attacker that facilities her reconstruction of a substitute model. We thus propose to leverage adversarial confidence perturbation to hide such varied confidence distribution given different queries, consequentially against model stealing attacks (dubbed as APMSA). In other words, the confidence vectors returned now is similar for queries from a specific category, considerably reducing information leakage of the MUA. To achieve this objective, through automated optimization, we constructively add delicate noise into per input query to make its confidence close to the decision boundary of the MUA. Generally, this process is achieved in a similar means of crafting adversarial examples but with a distinction that the hard label is preserved to be the same as the queried input. This retains the inference utility (i.e., without sacrificing the inference accuracy) for normal users but bounded the leaked confidence information to the attacker in a small constrained area (i.e., close to decision boundary). The later renders greatly deteriorated accuracy of the attacker's substitute model. As the APMSA serves as a plug-in front-end and requires no change to the MUA, it is thus generic and easy to deploy. The high efficacy of APMSA is validated through experiments on datasets of CIFAR10 and GTSRB. Given a MUA model of ResNet-18 on the CIFAR10, our defense can degrade the accuracy of the stolen model by up to 15% (rendering the stolen model useless to a large extent) with 0% accuracy drop for normal user's hard-label inference request.

SeInspect: Defending Model Stealing via Heterogeneous Semantic Inspection

D-DAE: Defense-Penetrating Model Extraction Attacks.

Model Stealing Detection for IoT Services Based on Multi-Dimensional Features

Making models more secure: An efficient model stealing detection method

Defending Against Model Stealing Via Verifying Embedded External Features

LSSMSD: Defending Against Black-Box DNN Model Stealing Based on Localized Stochastic Sensitivity

Sniffer: A Novel Model Type Detection System Against Machine-Learning-as-a-Service Platforms

Stealthy Adversarial Examples for Semantic Segmentation in Remote Sensing

Protecting Object Detection Models from Model Extraction Attack Via Feature Space Coverage

I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and Defences

FMSA: a Meta-Learning Framework-Based Fast Model Stealing Attack Technique Against Intelligent Network Intrusion Detection Systems

Inversion-guided Defense: Detecting Model Stealing Attacks by Output Inverting

Model Extraction Attacks and Defenses on Cloud-Based Machine Learning Models

ShrewdAttack: Low Cost High Accuracy Model Extraction.

Model Extraction Attacks Revisited

Detecting Semantic Attack in SCADA System: A Behavioral Model Based on Secondary Labeling of States-Duration Evolution Graph

APMSA: Adversarial Perturbation Against Model Stealing Attacks.

Hiding in Plain Sight: Disguising Data Stealing Attacks in Federated Learning

Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems

SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models

Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks