Abstract:The ability to transfer adversarial attacks from one model (the surrogate) to another model (the victim) has been an issue of concern within the machine learning (ML) community. The ability to successfully evade unseen models represents an uncomfortable level of ease toward implementing attacks. In this work we note that as studied, current transfer attack research has an unrealistic advantage for the attacker: the attacker has the exact same training data as the victim. We present the first study of transferring adversarial attacks focusing on the data available to attacker and victim under imperfect settings without querying the victim, where there is some variable level of overlap in the exact data used or in the classes learned by each model. This threat model is relevant to applications in medicine, malware, and others. Under this new threat model attack success rate is not correlated with data or class overlap in the way one would expect, and varies with dataset. This makes it difficult for attacker and defender to reason about each other and contributes to the broader study of model robustness and security. We remedy this by developing a masked version of Projected Gradient Descent that simulates class disparity, which enables the attacker to reliably estimate a lower-bound on their attack's success.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is an unrealistic advantage in the current research on adversarial attack transfer of machine - learning models, that is, the attacker has exactly the same training data as the victim. This assumption is unreasonable in real - world application scenarios, especially in fields such as medical treatment and malware detection. Therefore, the author proposes a new threat model, which takes into account the possible unknown data and class overlap between the attacker and the victim without relying on querying the victim model. Specifically, the paper focuses on the following aspects: 1. **The influence of data and class overlap**: Research on the success rate of adversarial attacks transferred from surrogate models to victim models under different levels of data and class overlap. The results show that the attack success rate is not always directly proportional to the degree of data or class overlap, which makes it difficult for attackers and defenders to predict each other's behaviors. 2. **The influence of adversarial training**: Explore the change in the attack success rate when both the surrogate model and the victim model are adversarially trained. The study found that although adversarial training can usually improve the robustness of the model against attacks, its effect is not as significant as expected under this new threat model. 3. **Eliminating class uncertainty**: In order to reduce the influence of class uncertainty on the attack success rate, the author developed an improved projected gradient descent (PGD) method - "masked PGD attack". This method randomly masks part of the class outputs in each iteration to simulate class differences, so that the attacker can more reliably estimate the lower limit of the success rate of his attack. ### Main contributions of the paper - **Proposing a new threat model**: For the first time, systematically study the problem of adversarial attack transfer when there is unknown data and class overlap between the attacker and the victim. - **Revealing the limitations of existing methods**: Point out that in the case of uncertain data and class overlap, existing adversarial attack methods and intuitions may no longer be applicable, making the attack unreliable. - **Developing a new attack method**: Propose the "masked PGD attack", which enables the attacker to attack more effectively in the case of uncertain class overlap by simulating class differences. ### Experimental results - **Standard adversarial transfer attack**: The experimental results show that the reduction in the degree of data overlap has a significant impact on the attack success rate, while the reduction in class overlap will greatly affect the attacker's success estimation, which may lead to over - confidence or under - confidence. - **Adversarial training**: Although adversarial training can improve the robustness of the model, its effect is not as expected under the new threat model. - **Masked PGD attack**: By eliminating class uncertainty, the masked PGD attack can provide more consistent and predictable attack behaviors, so that more shared data usually leads to a higher attack success rate. In short, this paper provides an important theoretical and practical basis for understanding and dealing with adversarial attacks in the real world by proposing a new threat model and an improved attack method.

Adversarial Transfer Attacks With Unknown Data and Class Overlap

Transfer Attacks Revisited: A Large-Scale Empirical Study in Real Computer Vision Settings

Towards Efficient Data Free Blackbox Adversarial Attack

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks

Your Attack Is Too DUMB: Formalizing Attacker Scenarios for Adversarial Transferability

Learning to Learn Transferable Attack

Why Does Little Robustness Help? A Further Step Towards Understanding Adversarial Transferability

The space of transferable adversarial examples

Towards Understanding Adversarial Transferability in Federated Learning

Delving into Transferable Adversarial Examples and Black-box Attacks

Why Does Little Robustness Help? Understanding and Improving Adversarial Transferability from Surrogate Training

Black-box Adversarial Transferability: An Empirical Study in Cybersecurity Perspective

PubDef: Defending Against Transfer Attacks From Public Models

Exploring Adversarial Attacks against Latent Diffusion Model from the Perspective of Adversarial Transferability

A Survey on Transferability of Adversarial Examples across Deep Neural Networks

On the Transferability of Adversarial Attacksagainst Neural Text Classifier

Understanding Model Ensemble in Transferable Adversarial Attack

Understanding and Enhancing the Transferability of Adversarial Examples