Adversarial Vulnerability of Randomized Ensembles

Hassan Dbouk,Naresh R. Shanbhag
DOI: https://doi.org/10.48550/arXiv.2206.06737
2022-06-14
Abstract:Despite the tremendous success of deep neural networks across various tasks, their vulnerability to imperceptible adversarial perturbations has hindered their deployment in the real world. Recently, works on randomized ensembles have empirically demonstrated significant improvements in adversarial robustness over standard adversarially trained (AT) models with minimal computational overhead, making them a promising solution for safety-critical resource-constrained applications. However, this impressive performance raises the question: Are these robustness gains provided by randomized ensembles real? In this work we address this question both theoretically and empirically. We first establish theoretically that commonly employed robustness evaluation methods such as adaptive PGD provide a false sense of security in this setting. Subsequently, we propose a theoretically-sound and efficient adversarial attack algorithm (ARC) capable of compromising random ensembles even in cases where adaptive PGD fails to do so. We conduct comprehensive experiments across a variety of network architectures, training schemes, datasets, and norms to support our claims, and empirically establish that randomized ensembles are in fact more vulnerable to $\ell_p$-bounded adversarial perturbations than even standard AT models. Our code can be found at <a class="link-external link-https" href="https://github.com/hsndbk4/ARC" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Cryptography and Security,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to explore and answer a key question: **Is the improvement in adversarial robustness provided by randomized ensembles real and reliable?** Specifically, although recent research has shown that randomized ensembles models have demonstrated significant robustness improvements in adversarial attacks and have a relatively small computational cost, it is still doubtful whether these improvements truly reflect the actual robustness of the models. To answer this question, the author has conducted in - depth research from both theoretical and empirical perspectives. #### Decomposition of the main problems: 1. **Effectiveness of standard attack algorithms**: - The paper first points out that commonly - used attack algorithms such as adaptive projected gradient descent (adaptive PGD, APGD) may have fundamental flaws when attacking randomized ensembles models. Specifically, APGD cannot guarantee to find an $\ell_p$-bounded adversarial perturbation, even if such a perturbation does exist. 2. **Proposing a new attack algorithm**: - To solve the above problem, the author proposes a new attack algorithm - **ARC (Attacking Randomized ensembles of Classifiers)**. This algorithm is specifically designed to evaluate the robustness of randomized ensembles models against $\ell_p$-bounded adversarial perturbations. 3. **Experimental verification**: - Through extensive experiments, the author shows that randomized ensembles models are actually more vulnerable to ARC attacks than standard adversarial training (adversarial training, AT) models. This indicates that the robustness gain obtained by previous evaluations using APGD may be a false sense of security. #### Formula summary: - **Auxiliary classifier**: \[ \bar{f}(x)=\left(\sum_{i = 1}^M\alpha_i\lambda_iw_i\right)^Tx+\bar{b} \] where $w_i$ and $b_i$ are the weights and biases of the $i$-th binary linear classifier respectively. - **APGD update rule**: \[ \delta^{(k)}=\Pi_{p,\epsilon}\left(\delta^{(k - 1)}+\eta\mu_p\left(\nabla_xl(x+\delta^{(k - 1)},y)\right)\right) \] - **ARC algorithm update rule**: \[ \hat{\delta}=\gamma(\delta+\beta g) \] where $g$ is the optimal unit $\ell_p$-norm perturbation direction that causes the classifier $f_i$ to misclassify. Through these studies, the paper reveals the potential problems in the evaluation of adversarial robustness of randomized ensembles models and proposes a more effective evaluation method.