Abstract:As state-of-the-art deep neural networks are being deployed at the core level of increasingly large numbers of AI-based products and services, the incentive for “copying them” (i.e., their intellectual property, manifested through the knowledge that is encapsulated in them) either by adversaries or commercial competitors is expected to considerably increase over time. The most efficient way to extract or steal knowledge from such networks is by querying them using a large dataset of random samples and recording their output, which is followed by the training of a student network, aiming to eventually mimic these outputs, without making any assumption about the original networks. The most effective way to protect against such a mimicking attack is to answer queries with the classification result only, omitting confidence values associated with the softmax layer. In this paper, we present a novel method for generating composite images for attacking a mentor neural network using a student model. Our method assumes no information regarding the mentor’s training dataset, architecture, or weights. Furthermore, assuming no information regarding the mentor’s softmax output values, our method successfully mimics the given neural network and is capable of stealing large portions (and sometimes all) of its encapsulated knowledge. Our student model achieved 99% relative accuracy to the protected mentor model on the Cifar-10 test set. In addition, we demonstrate that our student network (which copies the mentor) is impervious to watermarking protection methods and thus would evade being detected as a stolen model by existing dedicated techniques. Our results imply that all current neural networks are vulnerable to mimicking attacks, even if they do not divulge anything but the most basic required output, and that the student model that mimics them cannot be easily detected using currently available techniques.

GNMS: A Novel Method for Model Stealing Based on GAN

NetGuard: Protecting Commercial Web APIs from Model Inversion Attacks Using GAN-generated Fake Samples

The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks

A GAN-Based Defense Framework Against Model Inversion Attacks.

Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity

Efficient Model-Stealing Attacks Against Inductive Graph Neural Networks

Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks

Model Stealing Attack against Multi-Exit Networks

Dual Student Networks for Data-Free Model Stealing

Making models more secure: An efficient model stealing detection method

Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks

Model Extraction and Defenses on Generative Adversarial Networks

Inversion-guided Defense: Detecting Model Stealing Attacks by Output Inverting

ES Attack: Model Stealing against Deep Neural Networks without Data Hurdles

Defending Against Model Stealing Via Verifying Embedded External Features

Model for Peanuts: Hijacking ML Models without Training Access is Possible

SwiftTheft: A Time-Efficient Model Extraction Attack Framework Against Cloud-Based Deep Neural Networks

Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

A Novel Model Watermarking for Protecting Generative Adversarial Network

DNN Intellectual Property Extraction Using Composite Data

I Can Retrieve More Than Images: Contrastive Stealing Attack Against Deep Hashing Models