Abstract:Model ensemble adversarial attack has become a powerful method for generating transferable adversarial examples that can target even unknown models, but its theoretical foundation remains underexplored. To address this gap, we provide early theoretical insights that serve as a roadmap for advancing model ensemble adversarial attack. We first define transferability error to measure the error in adversarial transferability, alongside concepts of diversity and empirical model ensemble Rademacher complexity. We then decompose the transferability error into vulnerability, diversity, and a constant, which rigidly explains the origin of transferability error in model ensemble attack: the vulnerability of an adversarial example to ensemble components, and the diversity of ensemble components. Furthermore, we apply the latest mathematical tools in information theory to bound the transferability error using complexity and generalization terms, contributing to three practical guidelines for reducing transferability error: (1) incorporating more surrogate models, (2) increasing their diversity, and (3) reducing their complexity in cases of overfitting. Finally, extensive experiments with 54 models validate our theoretical framework, representing a significant step forward in understanding transferable model ensemble adversarial attacks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **The theoretical basis of model - integrated adversarial attacks is still not perfect**. Specifically, although model - integrated adversarial attacks are very effective in generating transferable adversarial examples, the theoretical mechanisms behind them have not been fully explored. Therefore, this paper aims to provide preliminary theoretical insights to fill this research gap and provide guidance for the development of future algorithms. ### Main problems 1. **Insufficient theoretical basis**: Although model - integrated adversarial attacks perform well in practice, their theoretical basis still lacks in - depth research. This has led to an incomplete understanding of the effectiveness and limitations of these attack methods. 2. **Sources of transfer error**: It is necessary to clarify the sources of transfer error, that is, why some adversarial examples have high transferability between different models while others do not. 3. **Strategies to improve transferability**: Specific strategies need to be proposed to reduce transfer error and thus improve the transferability of adversarial examples. ### Solutions To address the above problems, the author proposes the following key concepts and theoretical frameworks: 1. **Transferability Error**: - It is defined as the gap between the expected loss of an adversarial example and the expected loss of the most transferable adversarial example. - It is expressed by the formula: \[ TE(z, \epsilon) = L_P(z^*) - L_P(z) \] - Where \( L_P(z^*) \) is the expected loss of the optimal adversarial example, and \( L_P(z) \) is the expected loss of a given adversarial example. 2. **Diversity**: - It is defined as the variance of prediction results in the model ensemble and is used to quantify the diversity between models. - It is expressed by the formula: \[ \text{Var}_{\theta \sim P_\Theta}(f(\theta; x)) = E_{\theta \sim P_\Theta}[f(\theta; x) - E_{\theta \sim P_\Theta} f(\theta; x)]^2 \] 3. **Empirical Model Ensemble Rademacher Complexity**: - It is defined as the complexity of the model ensemble in the input space and is used to measure the flexibility of the model ensemble. - It is expressed by the formula: \[ R_N(Z) = E_\sigma \left[ \sup_{z \in Z} \frac{1}{N} \sum_{i = 1}^N \sigma_i \ell(f(\theta_i; x), y) \right] \] ### Theoretical contributions 1. **Vulnerability - Diversity Decomposition**: - The transfer error is decomposed into two parts: vulnerability and diversity. - It is expressed by the formula: \[ TE(z, \epsilon) = L_P(z^*) - \ell(\tilde{f}(\theta; x), y) - \text{Var}_{\theta \sim P_\Theta} f(\theta; x) \] - Where \(\tilde{f}(\theta; x) = E_{\theta \sim P_\Theta} f(\theta; x)\) represents the expected value of prediction in the parameter space. 2. **Upper Bound of Transferability Error**: - The upper bound of the transfer error is proposed, combining the empirical model ensemble Rademacher complexity and the generalization term. - It is expressed by the formula: \[

Understanding Model Ensemble in Transferable Adversarial Attack

Rethinking Model Ensemble in Transfer-based Adversarial Attacks

An Adaptive Model Ensemble Adversarial Attack for Boosting Adversarial Transferability

Ensemble Diversity Facilitates Adversarial Transferability

Boosting the Transferability of Ensemble Adversarial Attack via Stochastic Average Variance Descent

TRS: Transferability Reduced Ensemble via Encouraging Gradient Diversity and Model Smoothness

Transferable Adversarial Examples Can Efficiently Fool Topic Models

Generating Transferable Adversarial Examples from the Perspective of Ensemble and Distribution

TRS: Transferability Reduced Ensemble Via Promoting Gradient Diversity and Model Smoothness

Improving the Adversarial Transferability with Relational Graphs Ensemble Adversarial Attack

EnsembleFool: A Method to Generate Adversarial Examples Based on Model Fusion Strategy

Enhance Stealthiness and Transferability of Adversarial Attacks with Class Activation Mapping Ensemble Attack

Understanding and Enhancing the Transferability of Adversarial Examples

Delving into Transferable Adversarial Examples and Black-box Attacks

Model scheduling and sample selection for ensemble adversarial example attacks

Generating Adversarial Examples with Controllable Non-transferability

Towards the Transferable Audio Adversarial Attack via Ensemble Methods

Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability

Benchmarking Transferable Adversarial Attacks

Generating Adversarial Examples withControllable Non-transferability

The space of transferable adversarial examples