Adversarial Training of Two-Layer Polynomial and ReLU Activation Networks via Convex Optimization

Daniel Kuelbs,Sanjay Lall,Mert Pilanci
2024-10-17
Abstract:Training neural networks which are robust to adversarial attacks remains an important problem in deep learning, especially as heavily overparameterized models are adopted in safety-critical settings. Drawing from recent work which reformulates the training problems for two-layer ReLU and polynomial activation networks as convex programs, we devise a convex semidefinite program (SDP) for adversarial training of two-layer polynomial activation networks and prove that the convex SDP achieves the same globally optimal solution as its nonconvex counterpart. The convex SDP is observed to improve robust test accuracy against $\ell_\infty$ attacks relative to the original convex training formulation on multiple datasets. Additionally, we present scalable implementations of adversarial training for two-layer polynomial and ReLU networks which are compatible with standard machine learning libraries and GPU acceleration. Leveraging these implementations, we retrain the final two fully connected layers of a Pre-Activation ResNet-18 model on the CIFAR-10 dataset with both polynomial and ReLU activations. The two `robustified' models achieve significantly higher robust test accuracies against $\ell_\infty$ attacks than a Pre-Activation ResNet-18 model trained with sharpness-aware minimization, demonstrating the practical utility of convex adversarial training on large-scale problems.
Machine Learning,Optimization and Control
What problem does this paper attempt to address?
This paper aims to solve the robustness problem of neural networks under adversarial attacks. Specifically, the author focuses on how to train two - layer polynomial - activation networks and ReLU - activation networks with robustness through convex optimization methods. Currently, although deep - learning models perform well in many applications, they are vulnerable to adversarial attacks, especially in safety - critical application scenarios. Moreover, traditional adversarial training methods often require a large amount of computing resources and usually need to train the model from scratch to achieve the best results. To address these problems, the author proposes an adversarial training method based on convex semidefinite programming (SDP) for two - layer polynomial - activation networks. They prove that this convex SDP can reach the same global optimal solution as the non - convex adversarial training problem. Experimental results show that this method improves the robust test accuracy against \( \ell_\infty \) attacks on multiple datasets. In addition, the author also provides scalable adversarial training implementations that are compatible with standard machine - learning libraries and support GPU acceleration. Through these implementations, the author re - trains the last two fully - connected layers of the Pre - Activation ResNet - 18 model, using polynomial and ReLU activation functions. The results show that these two "reinforced" models have higher robust test accuracy when facing \( \ell_\infty \) attacks than the Pre - Activation ResNet - 18 model trained with Sharpness - Aware Minimization (SAM), which proves the practical utility of convex adversarial training in large - scale problems.