Abstract:Recent studies show that machine learning models are vulnerable to model extraction attacks, where the adversary builds a substitute model that achieves almost the same performance of a black-box victim model simply via querying the victim model. To defend against such attacks, a series of methods have been proposed to disrupt the query results before returning them to potential attackers, greatly degrading the performance of existing model extraction attacks. In this paper, we make the first attempt to develop a defensepenetrating model extraction attack framework, named D- DAE, which aims to break disruption-based defenses. The linchpins of D- DAE are the design of two modules, i.e., disruption detection and disruption recovery, which can be integrated with generic model extraction attacks. More specifically, after obtaining query results from the victim model, the disruption detection module infers the defense mechanism adopted by the defender. We design a meta-learning-based disruption detection algorithm for learning the fundamental differences between the distributions of disrupted and undisrupted query results. The algorithm features a good generalization property even if we have no access to the original training dataset of the victim model. Given the detected defense mechanism, the disruption recovery module tries to restore a clean query result from the disrupted query result with well-designed generative models. Our extensive evaluations on MNIST, FashionMNIST, CIFAR-10, GTSRB, and ImageNette datasets demonstrate that D- DAE can enhance the substitute model accuracy of the existing model extraction attacks by as much as 82.24% in the face of 4 state-of-the-art defenses and combinations of multiple defenses. We also verify the effectiveness of D-DAE in penetrating unknown defenses in real-world APIs hosted by Microsoft Azure and Face++.

Efficient Model Extraction by Data Set Stealing, Balancing, and Filtering

D-DAE: Defense-Penetrating Model Extraction Attacks.

CaBaGe: Data-Free Model Extraction using ClAss BAlanced Generator Ensemble

Extracting Robust Models with Uncertain Examples

Towards Efficient Data Free Blackbox Adversarial Attack

Efficient and Effective Model Extraction

Model Extraction and Defenses on Generative Adversarial Networks

Query-efficient Model Extraction for Text Classification Model in a Hard Label Setting

MEGEX: Data-Free Model Extraction Attack against Gradient-Based Explainable AI

Efficient Model Extraction via Boundary Sampling

Black-Box Ripper: Copying black-box models using generative evolutionary algorithms

Accelerating Dataset Distillation Via Model Augmentation

UGC: Unified GAN Compression for Efficient Image-to-Image Translation

Dual Student Networks for Data-Free Model Stealing

Thief, Beware of What Get You There: Towards Understanding Model Extraction Attack

QUEEN: Query Unlearning against Model Extraction

ShrewdAttack: Low Cost High Accuracy Model Extraction.

Army of Thieves: Enhancing Black-Box Model Extraction via Ensemble based sample selection

SwiftTheft: A Time-Efficient Model Extraction Attack Framework Against Cloud-Based Deep Neural Networks

Flocking Birds of a Feather Together: Dual-step GAN Distillation Via Realer-Fake Samples

Revealing the Unseen: Guiding Personalized Diffusion Models to Expose Training Data