Abstract:Neural networks are susceptible to adversarial perturbations that are transferable across different models. In this paper, we introduce a novel model alignment technique aimed at improving a given source model's ability in generating transferable adversarial perturbations. During the alignment process, the parameters of the source model are fine-tuned to minimize an alignment loss. This loss measures the divergence in the predictions between the source model and another, independently trained model, referred to as the witness model. To understand the effect of model alignment, we conduct a geometric analysis of the resulting changes in the loss landscape. Extensive experiments on the ImageNet dataset, using a variety of model architectures, demonstrate that perturbations generated from aligned source models exhibit significantly higher transferability than those from the original source model.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the transferability of adversarial perturbations. Specifically, the author proposes a new model - alignment technique, aiming to improve the ability of a given source model to generate transferable adversarial perturbations. ### Problem Background Neural networks are vulnerable to adversarial perturbations, which can be transferred between different models, that is, the perturbations generated when attacking one model can also deceive other models. This adversarial transferability has attracted wide attention in practical applications because it poses security challenges to the deployment of machine - learning systems. ### Paper Goals The author's goal is to enable the source model to generate more transferable adversarial perturbations through the model - alignment technique. The specific method is to fine - tune the parameters of the source model to minimize the alignment loss, which measures the prediction difference between the source model and another independently trained model (called the witness model). ### Main Contributions 1. **Model - Alignment Method**: Fine - tune the parameters of the source model by minimizing the alignment loss (such as KL - divergence) to make the generated adversarial perturbations more transferable. 2. **Geometric Analysis**: Studied the impact of the alignment process on the loss landscape and found that the loss surface of the aligned model is smoother, so the generated adversarial perturbations are more transferable. 3. **Experimental Verification**: Conducted a large number of experiments on the ImageNet dataset, using multiple model architectures, and proved that the adversarial perturbations generated by the aligned source model do have higher transferability, and this method is compatible with multiple attack algorithms. ### Mathematical Formulas - Alignment Loss: \[ \ell_a(x, \theta_s, \theta_w) = d(z_q^s(x), z_q^w(x)) \] where \(d\) is a measure of the output difference between the two models at the \(q\) - th layer, \(\theta_s\) and \(\theta_w\) are the parameters of the source model and the witness model respectively, and \(z_q^s(x)\) and \(z_q^w(x)\) are the outputs of the source model and the witness model at the \(q\) - th layer respectively. - KL - Divergence (used to measure the difference in probability distributions): \[ D_{\text{KL}}(P \parallel Q) = \sum_{i} P(i) \log \frac{P(i)}{Q(i)} \] ### Conclusion By aligning the source model, the transferability of the adversarial perturbations it generates can be significantly improved, which provides new ideas and methods for improving the security and robustness of deep - learning models.

Improving Adversarial Transferability via Model Alignment

Trust-aware Conditional Adversarial Domain Adaptation with Feature Norm Alignment.

Enhancing Adversarial Transferability with Adversarial Weight Tuning

Understanding and Enhancing the Transferability of Adversarial Examples

Feature Augmentation for Adversarial Robustness

Bag of Tricks to Boost Adversarial Transferability

Improving Adversarial Transferability with Neighbourhood Gradient Information

On the Transferability of Adversarial Attacksagainst Neural Text Classifier

Are aligned neural networks adversarially aligned?

Enhancing the Transferability of Adversarial Attacks through Variance Tuning

An Adaptive Model Ensemble Adversarial Attack for Boosting Adversarial Transferability

Adaptive Feature Alignment for Adversarial Training

Improving Adversarial Transferability via Neuron Attribution-Based Attacks

Improving Transferability of Adversarial Examples via Bayesian Attacks

A Survey on Transferability of Adversarial Examples across Deep Neural Networks

Boosting Transferability of Targeted Adversarial Examples via Hierarchical Generative Networks

Enhancing Adversarial Attacks: The Similar Target Method

Adversarial Training Helps Transfer Learning via Better Representations

Enhancing the transferability of adversarial samples with random noise techniques

Exploring Frequencies via Feature Mixing and Meta-Learning for Improving Adversarial Transferability

Improving the transferability of adversarial examples with path tuning