Understanding Model Ensemble in Transferable Adversarial Attack

Wei Yao,Zeliang Zhang,Huayi Tang,Yong Liu
2024-10-09
Abstract:Model ensemble adversarial attack has become a powerful method for generating transferable adversarial examples that can target even unknown models, but its theoretical foundation remains underexplored. To address this gap, we provide early theoretical insights that serve as a roadmap for advancing model ensemble adversarial attack. We first define transferability error to measure the error in adversarial transferability, alongside concepts of diversity and empirical model ensemble Rademacher complexity. We then decompose the transferability error into vulnerability, diversity, and a constant, which rigidly explains the origin of transferability error in model ensemble attack: the vulnerability of an adversarial example to ensemble components, and the diversity of ensemble components. Furthermore, we apply the latest mathematical tools in information theory to bound the transferability error using complexity and generalization terms, contributing to three practical guidelines for reducing transferability error: (1) incorporating more surrogate models, (2) increasing their diversity, and (3) reducing their complexity in cases of overfitting. Finally, extensive experiments with 54 models validate our theoretical framework, representing a significant step forward in understanding transferable model ensemble adversarial attacks.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **The theoretical basis of model - integrated adversarial attacks is still not perfect**. Specifically, although model - integrated adversarial attacks are very effective in generating transferable adversarial examples, the theoretical mechanisms behind them have not been fully explored. Therefore, this paper aims to provide preliminary theoretical insights to fill this research gap and provide guidance for the development of future algorithms. ### Main problems 1. **Insufficient theoretical basis**: Although model - integrated adversarial attacks perform well in practice, their theoretical basis still lacks in - depth research. This has led to an incomplete understanding of the effectiveness and limitations of these attack methods. 2. **Sources of transfer error**: It is necessary to clarify the sources of transfer error, that is, why some adversarial examples have high transferability between different models while others do not. 3. **Strategies to improve transferability**: Specific strategies need to be proposed to reduce transfer error and thus improve the transferability of adversarial examples. ### Solutions To address the above problems, the author proposes the following key concepts and theoretical frameworks: 1. **Transferability Error**: - It is defined as the gap between the expected loss of an adversarial example and the expected loss of the most transferable adversarial example. - It is expressed by the formula: \[ TE(z, \epsilon) = L_P(z^*) - L_P(z) \] - Where \( L_P(z^*) \) is the expected loss of the optimal adversarial example, and \( L_P(z) \) is the expected loss of a given adversarial example. 2. **Diversity**: - It is defined as the variance of prediction results in the model ensemble and is used to quantify the diversity between models. - It is expressed by the formula: \[ \text{Var}_{\theta \sim P_\Theta}(f(\theta; x)) = E_{\theta \sim P_\Theta}[f(\theta; x) - E_{\theta \sim P_\Theta} f(\theta; x)]^2 \] 3. **Empirical Model Ensemble Rademacher Complexity**: - It is defined as the complexity of the model ensemble in the input space and is used to measure the flexibility of the model ensemble. - It is expressed by the formula: \[ R_N(Z) = E_\sigma \left[ \sup_{z \in Z} \frac{1}{N} \sum_{i = 1}^N \sigma_i \ell(f(\theta_i; x), y) \right] \] ### Theoretical contributions 1. **Vulnerability - Diversity Decomposition**: - The transfer error is decomposed into two parts: vulnerability and diversity. - It is expressed by the formula: \[ TE(z, \epsilon) = L_P(z^*) - \ell(\tilde{f}(\theta; x), y) - \text{Var}_{\theta \sim P_\Theta} f(\theta; x) \] - Where \(\tilde{f}(\theta; x) = E_{\theta \sim P_\Theta} f(\theta; x)\) represents the expected value of prediction in the parameter space. 2. **Upper Bound of Transferability Error**: - The upper bound of the transfer error is proposed, combining the empirical model ensemble Rademacher complexity and the generalization term. - It is expressed by the formula: \[