Abstract:Deep neural networks exhibit vulnerability to adversarial examples that can transfer across different models. A particularly challenging problem is developing transferable targeted attacks that can mislead models into predicting specific target classes. While various methods have been proposed to enhance attack transferability, they often incur substantial computational costs while yielding limited improvements. Recent clean feature mixup methods use random clean features to perturb the feature space but lack optimization for disrupting adversarial examples, overlooking the advantages of attack-specific perturbations. In this paper, we propose Feature Tuning Mixup (FTM), a novel method that enhances targeted attack transferability by combining both random and optimized noises in the feature space. FTM introduces learnable feature perturbations and employs an efficient stochastic update strategy for optimization. These learnable perturbations facilitate the generation of more robust adversarial examples with improved transferability. We further demonstrate that attack performance can be enhanced through an ensemble of multiple FTM-perturbed surrogate models. Extensive experiments on the ImageNet-compatible dataset across various models demonstrate that our method achieves significant improvements over state-of-the-art methods while maintaining low computational cost.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the transferability of targeted adversarial examples for specific target classes. Specifically, existing methods face the problems of high computational cost and limited improvement when generating adversarial examples that can mislead deep neural networks (DNNs) to predict specific target classes. In addition, although some methods have carried out data augmentation in the image space, the augmentation methods in the feature space are still relatively less explored.
### Core problems of the paper:
1. **Limitations of existing methods**: Although existing transferability attack methods (such as Clean Feature Mixup, CFM) have made certain progress, they mainly rely on random clean features to perturb the feature space and are not optimized for adversarial examples. This leads to limited effectiveness of these methods in practical applications.
2. **Trade - off between computational cost and effectiveness**: Many methods will bring significant computational overhead while improving attack transferability, and although methods such as CFM are effective, there is still room for improvement.
### New method proposed in the paper:
To solve the above problems, the author proposes Feature Tuning Mixup (FTM), a new method that combines random and optimized noise, aiming to enhance the attack transferability for specific target classes by optimizing perturbations in the feature space. FTM introduces learnable feature perturbations and adopts an efficient random update strategy for optimization, thereby generating more powerful adversarial examples while maintaining a low computational cost.
### Main contributions:
1. **Introducing attack - specific feature perturbations**: Re - examined the feature - level augmentation methods and found that combining attack - specific feature perturbations can effectively improve the effectiveness of existing attacks.
2. **Proposing the FTM method**: FTM enhances the transfer attack for specific target classes by combining random and optimized feature perturbations. This method includes an efficient random update strategy that can improve attack transferability while maintaining computational efficiency.
3. **Experimental verification**: Through extensive experiments on ImageNet - compatible datasets, it is proved that FTM is significantly superior to existing methods on various source models and target models while maintaining a low computational cost.
### Formula summary:
- Optimization objective for adversarial example generation:
\[
\arg \min_{x_{\text{adv}}} L(F(x_{\text{adv}}), y_t), \quad \text{s.t.} \quad \|x - x_{\text{adv}}\|_\infty \leq \epsilon
\]
where \(L\) is the adversarial loss function, \(y_t\) is the target label, and \(\epsilon\) is the perturbation budget.
- Forward process formula of FTM:
\[
\bar{z}_{k,i} = z_{k,i} + \beta \|z_{k,i}\| \cdot \frac{\Delta z_{k,i}}{\|\Delta z_{k,i}\| + \bar{\epsilon}}
\]
\[
z'_{k,i} =
\begin{cases}
(1 - \alpha_{k,i}) \odot \bar{z}_{k,i} + \alpha_{k,i} \odot z^c_{k,i}, & \tau_k < p \\
\bar{z}_{k,i}, & \text{otherwise}
\end{cases}
\]
Through these improvements, FTM not only improves the transferability of adversarial examples but also performs well in terms of computational efficiency.