Abstract:A longstanding problem of deep learning models is their vulnerability to adversarial examples, which are often generated by applying imperceptible perturbations to natural examples. Adversarial examples exhibit cross-model transferability, enabling to attack black-box models with limited information about their architectures and parameters. Model ensembling is an effective strategy to improve the transferability by attacking multiple surrogate models simultaneously. However, as prior studies usually adopt few models in the ensemble, there remains an open question of whether scaling the number of models can further improve black-box attacks. Inspired by the findings in large foundation models, we investigate the scaling laws of black-box adversarial attacks in this work. By analyzing the relationship between the number of surrogate models and transferability of adversarial examples, we conclude with clear scaling laws, emphasizing the potential of using more surrogate models to enhance adversarial transferability. Extensive experiments verify the claims on standard image classifiers, multimodal large language models, and even proprietary models like GPT-4o, demonstrating consistent scaling effects and impressive attack success rates with more surrogate models. Further studies by visualization indicate that scaled attacks bring better interpretability in semantics, indicating that the common features of models are captured.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the vulnerability of deep - learning models to adversarial examples, especially in the black - box attack scenario. Specifically, the paper explores the potential of increasing the number of surrogate models to improve the cross - model transferability of adversarial examples. The main contribution of the paper lies in studying the relationship between the number of surrogate models and the transferability of adversarial examples, and proposing an explicit expansion law, emphasizing the potential of using more surrogate models to enhance adversarial transferability.
### Background of the Paper
1. **Vulnerability of Deep - Learning Models**:
- Deep - learning models have shown significant vulnerability when facing adversarial examples. Adversarial examples are usually generated by applying small and almost imperceptible perturbations to natural samples, and these perturbations can cause the model to make wrong predictions.
2. **Black - Box Attack**:
- Black - box attack refers to an attack carried out by an attacker when having limited information about the architecture and parameters of the target model. This attack method takes advantage of the cross - model transferability of adversarial examples, that is, adversarial examples generated on one model can also be effective on other models.
3. **Model Ensemble**:
- Model ensemble is an effective strategy to improve the transferability of adversarial examples by simultaneously attacking multiple surrogate models. However, previous studies usually only adopted a few surrogate models, so whether the effect of black - box attacks can be further improved by increasing the number of surrogate models is still an open question.
### Research Methods
1. **Theoretical Analysis**:
- The paper first carried out a theoretical analysis to explore the relationship between the number of surrogate models and the transferability of adversarial examples. By analyzing the convergence of model ensembles, the author proposed further motivation for empirical research.
2. **Experimental Verification**:
- The author verified his theoretical hypothesis through extensive experiments. The experiments covered standard image classifiers, multi - modal large - language models, and proprietary models (such as GPT - 4). The experimental results showed that as the number of surrogate models increased, the attack success rate increased significantly, and the change in the loss function showed obvious symmetry.
3. **Visualization Analysis**:
- Through visualization analysis, the author found that as the number of surrogate models increased, the semantic interpretability of adversarial perturbations also increased, indicating that adversarial examples captured the common features of different models on the natural image manifold.
### Main Findings
1. **Expansion Law**:
- The paper discovered an explicit expansion law between the number of surrogate models and the attack success rate in black - box adversarial attacks. Specifically, there is a logarithmic relationship between the attack success rate \( \text{ASR} \) and the number of surrogate models \( T \):
\[
\text{ASR}=\alpha\log T + C
\]
- Here, \( \alpha \) and \( C \) are constants, which are affected by multiple factors, such as hyper - parameters, model architecture, and adversarial targets.
2. **Model Robustness**:
- The experimental results also showed that multi - modal models (such as visual - language models) showed higher robustness in adversarial attacks, which supported the view that multi - modal training can generate more robust representations.
### Applications and Significance
1. **Black - Box Model Attack**:
- Through the expansion law, researchers can generate adversarial examples more effectively, thereby attacking black - box models. This is of great significance for evaluating the security of models and developing more robust algorithms.
2. **Vulnerability of Multi - Modal Models**:
- Although multi - modal models showed higher robustness in adversarial attacks, the experimental results still revealed the potential vulnerability of these models, emphasizing the necessity of developing more secure multi - modal base models.
### Conclusion
This paper proved the expansion law between the number of surrogate models and the success rate of black - box adversarial attacks through theoretical analysis and experiments, providing a new perspective and method for improving the transferability of adversarial examples. This finding not only helps to understand the nature of adversarial attacks but also provides an important reference for developing more secure deep - learning models.