Abstract:Recent studies show that machine learning models are vulnerable to model extraction attacks, where the adversary builds a substitute model that achieves almost the same performance of a black-box victim model simply via querying the victim model. To defend against such attacks, a series of methods have been proposed to disrupt the query results before returning them to potential attackers, greatly degrading the performance of existing model extraction attacks. In this paper, we make the first attempt to develop a defensepenetrating model extraction attack framework, named D- DAE, which aims to break disruption-based defenses. The linchpins of D- DAE are the design of two modules, i.e., disruption detection and disruption recovery, which can be integrated with generic model extraction attacks. More specifically, after obtaining query results from the victim model, the disruption detection module infers the defense mechanism adopted by the defender. We design a meta-learning-based disruption detection algorithm for learning the fundamental differences between the distributions of disrupted and undisrupted query results. The algorithm features a good generalization property even if we have no access to the original training dataset of the victim model. Given the detected defense mechanism, the disruption recovery module tries to restore a clean query result from the disrupted query result with well-designed generative models. Our extensive evaluations on MNIST, FashionMNIST, CIFAR-10, GTSRB, and ImageNette datasets demonstrate that D- DAE can enhance the substitute model accuracy of the existing model extraction attacks by as much as 82.24% in the face of 4 state-of-the-art defenses and combinations of multiple defenses. We also verify the effectiveness of D-DAE in penetrating unknown defenses in real-world APIs hosted by Microsoft Azure and Face++.

DiffBreak: Breaking Diffusion-Based Purification with Adaptive Attacks

D-DAE: Defense-Penetrating Model Extraction Attacks.

Attack As Defense: Characterizing Adversarial Examples Using Robustness.

ADBM: Adversarial diffusion bridge model for reliable adversarial purification

Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective

Purify++: Improving Diffusion-Purification with Advanced Diffusion Models and Control of Randomness

Diffusion Policy Attacker: Crafting Adversarial Attacks for Diffusion-based Policies

DiffDefense: Defending against Adversarial Attacks via Diffusion Models

Robust Diffusion Models for Adversarial Purification

DiffuseDef: Improved Robustness to Adversarial Attacks

DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks

Diffusion-based Adversarial Purification for Intrusion Detection

Real-world Adversarial Defense against Patch Attacks based on Diffusion Model

Robust Evaluation of Diffusion-Based Adversarial Purification

Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models

StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model

Guided Diffusion Model for Adversarial Purification

Struggle with Adversarial Defense? Try Diffusion

DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing

Mitigating Advanced Adversarial Attacks with More Advanced Gradient Obfuscation Techniques

Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models