Abstract:The massive deployment of Machine Learning (ML) models has been accompanied by the emergence of several attacks that threaten their trustworthiness and raise ethical and societal concerns such as invasion of privacy, discrimination risks, and lack of accountability. Model hijacking is one of these attacks, where the adversary aims to hijack a victim model to execute a different task than its original one. Model hijacking can cause accountability and security risks since a hijacked model owner can be framed for having their model offering illegal or unethical services. Prior state-of-the-art works consider model hijacking as a training time attack, whereby an adversary requires access to the ML model training to execute their attack. In this paper, we consider a stronger threat model where the attacker has no access to the training phase of the victim model. Our intuition is that ML models, typically over-parameterized, might (unintentionally) learn more than the intended task for they are trained. We propose a simple approach for model hijacking at inference time named SnatchML to classify unknown input samples using distance measures in the latent space of the victim model to previously known samples associated with the hijacking task classes. SnatchML empirically shows that benign pre-trained models can execute tasks that are semantically related to the initial task. Surprisingly, this can be true even for hijacking tasks unrelated to the original task. We also explore different methods to mitigate this risk. We first propose a novel approach we call meta-unlearning, designed to help the model unlearn a potentially malicious task while training on the original task dataset. We also provide insights on over-parameterization as one possible inherent factor that makes model hijacking easier, and we accordingly propose a compression-based countermeasure against this attack.

Model Weight Theft With Just Noise Inputs: The Curious Case of the Petulant Attacker

Efficient Model Stealing Defense with Noise Transition Matrix

Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks

Disarming Steganography Attacks Inside Neural Network Models

Model Stealing Attack Based on Sampling and Weighting

DeepTheft: Stealing DNN Model Architectures through Power Side Channel

Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks

Model for Peanuts: Hijacking ML Models without Training Access is Possible

Stealing the Invisible: Unveiling Pre-Trained CNN Models through Adversarial Examples and Timing Side-Channels

Have You Stolen My Model? Evasion Attacks Against Deep Neural Network Watermarking Techniques

ES Attack: Model Stealing against Deep Neural Networks without Data Hurdles

AuthNet: Neural Network with Integrated Authentication Logic

Patch-Wise Attack for Fooling Deep Neural Network

Walking Noise: On Layer-Specific Robustness of Neural Architectures against Noisy Computations and Associated Characteristic Learning Dynamics

Hijacking Attacks against Neural Networks by Analyzing Training Data

Inversion-guided Defense: Detecting Model Stealing Attacks by Output Inverting

Enhancing the Transferability of Adversarial Examples with Noise Reduced Gradient

I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and Defences

Interesting Near-boundary Data: Inferring Model Ownership for DNNs.

DeepEM: Deep Neural Networks Model Recovery through EM Side-Channel Information Leakage

Perceptual Model Hashing: Towards Neural Network Model Authentication