Abstract:Training a Deep Learning (DL) model requires proprietary data and computing-intensive resources. To recoup their training costs, a model provider can monetize DL models through Machine Learning as a Service (MLaaS). Generally, the model is deployed at the cloud, while providing a publicly accessible Application Programming Interface (API) for paid queries to obtain benefits. However, model stealing attacks have posed security threats to this model monetizing scheme as they steal the model without paying for future extensive queries. Specifically, an adversary queries a targeted model to obtain input-output pairs and thus infer the model's internal working mechanism by reverse-engineering a substitute model, which has deprived model owner's business advantage and leaked the privacy of the model. In this work, we observe that the confidence vector or the top-1 confidence returned from the model under attack (MUA) varies in a relative large degree given different queried inputs. Therefore, rich internal information of the MUA is leaked to the attacker that facilities her reconstruction of a substitute model. We thus propose to leverage adversarial confidence perturbation to hide such varied confidence distribution given different queries, consequentially against model stealing attacks (dubbed as APMSA). In other words, the confidence vectors returned now is similar for queries from a specific category, considerably reducing information leakage of the MUA. To achieve this objective, through automated optimization, we constructively add delicate noise into per input query to make its confidence close to the decision boundary of the MUA. Generally, this process is achieved in a similar means of crafting adversarial examples but with a distinction that the hard label is preserved to be the same as the queried input. This retains the inference utility (i.e., without sacrificing the inference accuracy) for normal users but bounded the leaked confidence information to the attacker in a small constrained area (i.e., close to decision boundary). The later renders greatly deteriorated accuracy of the attacker's substitute model. As the APMSA serves as a plug-in front-end and requires no change to the MUA, it is thus generic and easy to deploy. The high efficacy of APMSA is validated through experiments on datasets of CIFAR10 and GTSRB. Given a MUA model of ResNet-18 on the CIFAR10, our defense can degrade the accuracy of the stolen model by up to 15% (rendering the stolen model useless to a large extent) with 0% accuracy drop for normal user's hard-label inference request.

Model for Peanuts: Hijacking ML Models without Training Access is Possible

CAMH: Advancing Model Hijacking Attack in Machine Learning

Stealing Machine Learning Models via Prediction APIs

ML-Stealer: Stealing Prediction Functionality of Machine Learning Models with Mere Black-Box Access

Model Hijacking Attack in Federated Learning

Beyond Labeling Oracles: What does it mean to steal ML models?

I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and Defences

Sniffer: A Novel Model Type Detection System Against Machine-Learning-as-a-Service Platforms

Model-Reuse Attacks on Deep Learning Systems

Vera Verto: Multimodal Hijacking Attack

FMSA: a Meta-Learning Framework-Based Fast Model Stealing Attack Technique Against Intelligent Network Intrusion Detection Systems

Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks

Model Extraction Attacks Revisited

APMSA: Adversarial Perturbation Against Model Stealing Attacks.

Stealing the Invisible: Unveiling Pre-Trained CNN Models through Adversarial Examples and Timing Side-Channels

Models Are Codes: Towards Measuring Malicious Code Poisoning Attacks on Pre-trained Model Hubs

InverseNet: Augmenting Model Extraction Attacks with Training Data Inversion

A Large-Scale Exploit Instrumentation Study of AI/ML Supply Chain Attacks in Hugging Face Models

GNMS: A Novel Method for Model Stealing Based on GAN

LSSMSD: Defending Against Black-Box DNN Model Stealing Based on Localized Stochastic Sensitivity

Disarming Steganography Attacks Inside Neural Network Models