Is RobustBench/AutoAttack a suitable Benchmark for Adversarial Robustness?

Peter Lorenz,Dominik Strassel,Margret Keuper,Janis Keuper
2024-02-20
Abstract:Recently, RobustBench (Croce et al. 2020) has become a widely recognized benchmark for the adversarial robustness of image classification networks. In its most commonly reported sub-task, RobustBench evaluates and ranks the adversarial robustness of trained neural networks on CIFAR10 under AutoAttack (Croce and Hein 2020b) with l-inf perturbations limited to eps = 8/255. With leading scores of the currently best performing models of around 60% of the baseline, it is fair to characterize this benchmark to be quite challenging. Despite its general acceptance in recent literature, we aim to foster discussion about the suitability of RobustBench as a key indicator for robustness which could be generalized to practical applications. Our line of argumentation against this is two-fold and supported by excessive experiments presented in this paper: We argue that I) the alternation of data by AutoAttack with l-inf, eps = 8/255 is unrealistically strong, resulting in close to perfect detection rates of adversarial samples even by simple detection algorithms and human observers. We also show that other attack methods are much harder to detect while achieving similar success rates. II) That results on low-resolution data sets like CIFAR10 do not generalize well to higher resolution images as gradient-based attacks appear to become even more detectable with increasing resolutions.
Computer Vision and Pattern Recognition,Cryptography and Security
What problem does this paper attempt to address?
This paper attempts to explore and question whether RobustBench and its AutoAttack framework, which is used by default, are suitable as benchmarks for evaluating the adversarial robustness of image classification models. Specifically, the author presents two main arguments: 1. **Excessive perturbation intensity of AutoAttack**: The author believes that AutoAttack performs unrealistically strong perturbations on data when using the \( l_{\infty} \) norm and \(\epsilon = 8/255\), resulting in the fact that even simple detection algorithms can detect adversarial samples almost perfectly. This makes successful attacks in practical applications very difficult. 2. **Results on low - resolution datasets are difficult to generalize to high - resolution images**: The author points out that the results on low - resolution datasets (such as CIFAR - 10) cannot be directly generalized to applications of high - resolution images. As the image resolution increases, gradient - based attacks become more easily detectable. Through experiments, the author shows the performance of adversarial samples generated by AutoAttack under different datasets and different detection methods, further supporting their views. These experimental results indicate that the effectiveness of AutoAttack on high - resolution images decreases significantly, and other attack methods are more difficult to detect than AutoAttack in some cases. In summary, the main purpose of this paper is to initiate a discussion about the applicability of RobustBench and its AutoAttack framework as adversarial robustness benchmarks, and to propose the need for more realistic and comprehensive evaluation methods to measure the robustness of models in practical applications.