Guardian of the Ensembles: Introducing Pairwise Adversarially Robust Loss for Resisting Adversarial Attacks in DNN Ensembles

Shubhi Shukla,Subhadeep Dalui,Manaar Alam,Shubhajit Datta,Arijit Mondal,Debdeep Mukhopadhyay,Partha Pratim Chakrabarti
2024-12-03
Abstract:Adversarial attacks rely on transferability, where an adversarial example (AE) crafted on a surrogate classifier tends to mislead a target classifier. Recent ensemble methods demonstrate that AEs are less likely to mislead multiple classifiers in an ensemble. This paper proposes a new ensemble training using a Pairwise Adversarially Robust Loss (PARL) that by construction produces an ensemble of classifiers with diverse decision boundaries. PARL utilizes outputs and gradients of each layer with respect to network parameters in every classifier within the ensemble simultaneously. PARL is demonstrated to achieve higher robustness against black-box transfer attacks than previous ensemble methods as well as adversarial training without adversely affecting clean example accuracy. Extensive experiments using standard Resnet20, WideResnet28-10 classifiers demonstrate the robustness of PARL against state-of-the-art adversarial attacks. While maintaining similar clean accuracy and lesser training time, the proposed architecture has a 24.8% increase in robust accuracy ($\epsilon$ = 0.07) from the state-of-the art method.
Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of the vulnerability of deep neural network (DNN) models when facing adversarial attacks. Specifically, the paper proposes a new integrated training method - **Pairwise Adversarially Robust Loss (PARL)** to improve the robustness of an integrated model composed of multiple classifiers against black - box transfer attacks. #### Main problems: 1. **Transferability of adversarial attacks**: Adversarial examples (AE) can usually be transferred across different models, that is, adversarial examples generated on a surrogate model can also mislead the target model. This transferability makes it difficult for the defense measures of a single model to resist complex adversarial attacks. 2. **Limitations of existing integration methods**: Although existing integration methods can resist adversarial attacks to a certain extent, they do not explicitly enhance the diversity of decision boundaries between models, resulting in these methods being less effective when facing stronger adversaries and may affect the accuracy of clean samples. #### Solutions: - **Introducing the PARL loss function**: By using the outputs and gradients of each layer within each model, the PARL loss function can prompt each classifier in the integration to generate different decision boundaries during the training process. Specifically, PARL achieves this goal in the following ways: - **Gradient orthogonalization**: Make the gradients of different models on the same input as dissimilar or orthogonal as possible, thereby reducing the transferability of adversarial examples between different models. - **Decorrelation of intermediate - layer outputs**: Minimize the correlation of intermediate - layer outputs between different models to ensure that each classifier has a unique feature representation when processing and interpreting input data. #### Experimental verification: - The paper conducted extensive experiments on the CIFAR - 10, CIFAR - 100 and Tiny Imagenet datasets using the standard Resnet20 and WideResnet28 - 10 architectures to verify the effectiveness of PARL. - The results show that PARL not only significantly improves the robustness of the integrated model against black - box transfer attacks, but also reduces the training time by about one - third while maintaining the clean - sample accuracy comparable to the existing best methods. In conclusion, by introducing the PARL loss function, this paper provides a systematic method to increase the diversity of decision boundaries of each classifier in the integrated model, thereby effectively resisting adversarial attacks and maintaining high classification performance.