Abstract:Recent studies show that machine learning models are vulnerable to model extraction attacks, where the adversary builds a substitute model that achieves almost the same performance of a black-box victim model simply via querying the victim model. To defend against such attacks, a series of methods have been proposed to disrupt the query results before returning them to potential attackers, greatly degrading the performance of existing model extraction attacks. In this paper, we make the first attempt to develop a defensepenetrating model extraction attack framework, named D- DAE, which aims to break disruption-based defenses. The linchpins of D- DAE are the design of two modules, i.e., disruption detection and disruption recovery, which can be integrated with generic model extraction attacks. More specifically, after obtaining query results from the victim model, the disruption detection module infers the defense mechanism adopted by the defender. We design a meta-learning-based disruption detection algorithm for learning the fundamental differences between the distributions of disrupted and undisrupted query results. The algorithm features a good generalization property even if we have no access to the original training dataset of the victim model. Given the detected defense mechanism, the disruption recovery module tries to restore a clean query result from the disrupted query result with well-designed generative models. Our extensive evaluations on MNIST, FashionMNIST, CIFAR-10, GTSRB, and ImageNette datasets demonstrate that D- DAE can enhance the substitute model accuracy of the existing model extraction attacks by as much as 82.24% in the face of 4 state-of-the-art defenses and combinations of multiple defenses. We also verify the effectiveness of D-DAE in penetrating unknown defenses in real-world APIs hosted by Microsoft Azure and Face++.

A GAN-Based Defense Framework Against Model Inversion Attacks.

NetGuard: Protecting Commercial Web APIs from Model Inversion Attacks Using GAN-generated Fake Samples

The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks

D-DAE: Defense-Penetrating Model Extraction Attacks.

Boosting Model Inversion Attacks with Adversarial Examples

Inversion-guided Defense: Detecting Model Stealing Attacks by Output Inverting

A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks

Model Inversion Attacks Through Target-Specific Conditional Diffusion Models

Model Inversion Attack via Dynamic Memory Learning

Reinforcement Learning-Based Black-Box Model Inversion Attacks

Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks

Privacy Leakage on DNNs: A Survey of Model Inversion Attacks and Defenses

CALoR: Towards Comprehensive Model Inversion Defense

Re-thinking Model Inversion Attacks Against Deep Neural Networks

GAN-based Domain Inference Attack

Model Inversion Attacks: A Survey of Approaches and Countermeasures

Pseudo Label-Guided Model Inversion Attack Via Conditional Generative Adversarial Network.

Breaking the Black-Box: Confidence-Guided Model Inversion Attack for Distribution Shift

Distributional Black-Box Model Inversion Attack with Multi-Agent Reinforcement Learning

Model Inversion Robustness: Can Transfer Learning Help?

Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models