Abstract:Empirical robustness evaluation (RE) of deep learning models against adversarial perturbations entails solving nontrivial constrained optimization problems. Existing numerical algorithms that are commonly used to solve them in practice predominantly rely on projected gradient, and mostly handle perturbations modeled by the $\ell_1$, $\ell_2$ and $\ell_\infty$ distances. In this paper, we introduce a novel algorithmic framework that blends a general-purpose constrained-optimization solver PyGRANSO with Constraint Folding (PWCF), which can add more reliability and generality to the state-of-the-art RE packages, e.g., AutoAttack. Regarding reliability, PWCF provides solutions with stationarity measures and feasibility tests to assess the solution quality. For generality, PWCF can handle perturbation models that are typically inaccessible to the existing projected gradient methods; the main requirement is the distance metric to be almost everywhere differentiable. Taking advantage of PWCF and other existing numerical algorithms, we further explore the distinct patterns in the solutions found for solving these optimization problems using various combinations of losses, perturbation models, and optimization algorithms. We then discuss the implications of these patterns on the current robustness evaluation and adversarial training.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the reliability and generality of robustness evaluation (RE) of deep - learning models under adversarial perturbations. Specifically, the paper focuses on two core issues:
1. **Lack of reliability**: Existing direct numerical methods can obtain solutions, but they are unable to evaluate the quality of the solutions, which makes the reliability of the solutions questionable. For example, Figure 1 shows that the default maximum number of iterations (MaxIter) in AutoAttack mainly leads to the premature termination of the optimization process, and the actual number of iterations required varies from sample to sample, making it difficult to effectively terminate the optimization process by setting the maximum number of iterations.
2. **Lack of generality**: Most of the existing numerical methods are applied to perturbation models based on $\ell_1$, $\ell_2$ or $\ell_\infty$ distances, but they are unable to handle other distance metrics. For example, the popular RE benchmark robustbench only provides leaderboards for $\ell_2$ and $\ell_\infty$, and the most commonly studied adversarial attack model is the $\ell_\infty$ attack.
To solve these problems, the paper proposes a new algorithmic framework, namely PyGRANSO with Constraint - Folding (PWCF), which combines the general - constraint optimization solver PyGRANSO and the constraint - folding technique to improve the reliability and generality of robustness evaluation. Specific contributions include:
- **Universal solver**: PWCF can handle distance metrics that are differentiable almost everywhere, and is applicable not only to $\ell_1$, $\ell_2$ and $\ell_\infty$ distances, but also to other more complex distance metrics.
- **Reliable stopping criteria**: PWCF is equipped with strict line - search rules and stopping criteria, and can evaluate the quality of solutions through constraint violation and stationary - point estimation. Users can judge whether further optimization is required according to these indicators.
- **Performance comparison**: Experimental results show that PWCF is not only comparable to existing state - of - the - art RE packages (such as AutoAttack) when solving the max - loss form and the min - radius form, but can also handle more types of distance metrics and provide diverse solutions.
In addition, the paper also explores the sparse patterns of solutions under different combinations of solvers, loss functions and distance metrics, and discusses the impact of these patterns on current robustness evaluation and adversarial training. Specifically:
- **Different combinations of distance metrics, loss functions and optimization solvers will lead to different sparse patterns**, which are crucial for calculating robust accuracy, and combining all possible patterns can lead to more reliable and accurate results.
- **The robust accuracy under the currently used preset perturbation level is usually not a good indicator for measuring robustness**, and solving the min - radius form can provide more information about the robustness of each input.
- **Using the projected gradient descent (PGD) method for adversarial training on a single distance metric may not achieve generalized adversarial robustness**, because the adversarially trained model may be only robust to the patterns seen during the training process.
In summary, by introducing the PWCF framework, this paper aims to improve the reliability and generality of robustness evaluation of deep - learning models under adversarial perturbations, and provides effective solutions for a wider range of perturbation models.