Abstract:Despite the tremendous success of deep neural networks across various tasks, their vulnerability to imperceptible adversarial perturbations has hindered their deployment in the real world. Recently, works on randomized ensembles have empirically demonstrated significant improvements in adversarial robustness over standard adversarially trained (AT) models with minimal computational overhead, making them a promising solution for safety-critical resource-constrained applications. However, this impressive performance raises the question: Are these robustness gains provided by randomized ensembles real? In this work we address this question both theoretically and empirically. We first establish theoretically that commonly employed robustness evaluation methods such as adaptive PGD provide a false sense of security in this setting. Subsequently, we propose a theoretically-sound and efficient adversarial attack algorithm (ARC) capable of compromising random ensembles even in cases where adaptive PGD fails to do so. We conduct comprehensive experiments across a variety of network architectures, training schemes, datasets, and norms to support our claims, and empirically establish that randomized ensembles are in fact more vulnerable to $\ell_p$-bounded adversarial perturbations than even standard AT models. Our code can be found at <a class="link-external link-https" href="https://github.com/hsndbk4/ARC" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to explore and answer a key question: **Is the improvement in adversarial robustness provided by randomized ensembles real and reliable?** Specifically, although recent research has shown that randomized ensembles models have demonstrated significant robustness improvements in adversarial attacks and have a relatively small computational cost, it is still doubtful whether these improvements truly reflect the actual robustness of the models. To answer this question, the author has conducted in - depth research from both theoretical and empirical perspectives. #### Decomposition of the main problems: 1. **Effectiveness of standard attack algorithms**: - The paper first points out that commonly - used attack algorithms such as adaptive projected gradient descent (adaptive PGD, APGD) may have fundamental flaws when attacking randomized ensembles models. Specifically, APGD cannot guarantee to find an $\ell_p$-bounded adversarial perturbation, even if such a perturbation does exist. 2. **Proposing a new attack algorithm**: - To solve the above problem, the author proposes a new attack algorithm - **ARC (Attacking Randomized ensembles of Classifiers)**. This algorithm is specifically designed to evaluate the robustness of randomized ensembles models against $\ell_p$-bounded adversarial perturbations. 3. **Experimental verification**: - Through extensive experiments, the author shows that randomized ensembles models are actually more vulnerable to ARC attacks than standard adversarial training (adversarial training, AT) models. This indicates that the robustness gain obtained by previous evaluations using APGD may be a false sense of security. #### Formula summary: - **Auxiliary classifier**: \[ \bar{f}(x)=\left(\sum_{i = 1}^M\alpha_i\lambda_iw_i\right)^Tx+\bar{b} \] where $w_i$ and $b_i$ are the weights and biases of the $i$-th binary linear classifier respectively. - **APGD update rule**: \[ \delta^{(k)}=\Pi_{p,\epsilon}\left(\delta^{(k - 1)}+\eta\mu_p\left(\nabla_xl(x+\delta^{(k - 1)},y)\right)\right) \] - **ARC algorithm update rule**: \[ \hat{\delta}=\gamma(\delta+\beta g) \] where $g$ is the optimal unit $\ell_p$-norm perturbation direction that causes the classifier $f_i$ to misclassify. Through these studies, the paper reveals the potential problems in the evaluation of adversarial robustness of randomized ensembles models and proposes a more effective evaluation method.

Adversarial Vulnerability of Randomized Ensembles

An Empirical Investigation of Randomized Defenses against Adversarial Attacks

Adversarial Defense Via Self-Orthogonal Randomization Super-Network.

Adversarial Robust Decision-Making under Uncertainty Learning and Dynamic Ensemble Selection

Demystifying the Adversarial Robustness of Random Transformation Defenses

Ensemble Methods as a Defense to Adversarial Perturbations Against Deep Neural Networks

Improving Adversarial Robustness via Promoting Ensemble Diversity.

Revisiting Ensembles in an Adversarial Context: Improving Natural Accuracy

Improving Adversarial Robustness Via Promoting Ensemble Diversity

Certifying Joint Adversarial Robustness for Model Ensembles

Adversarial Attacks Neutralization via Data Set Randomization

Guardian of the Ensembles: Introducing Pairwise Adversarially Robust Loss for Resisting Adversarial Attacks in DNN Ensembles

Dynamic Defense Approach for Adversarial Robustness in Deep Neural Networks via Stochastic Ensemble Smoothed Model

Dynamic ensemble selection based on Deep Neural Network Uncertainty Estimation for Adversarial Robustness

Improved Robustness Against Adaptive Attacks With Ensembles and Error-Correcting Output Codes

On Adversarial Robustness: A Neural Architecture Search perspective

Ensemble Adversarial Training: Attacks and Defenses

Not So Robust After All: Evaluating the Robustness of Deep Neural Networks to Unseen Adversarial Attacks

Analysis of Random Perturbations for Robust Convolutional Neural Networks

Constant Random Perturbations Provide Adversarial Robustness with Minimal Effect on Accuracy

Local Competition and Uncertainty for Adversarial Robustness in Deep Learning