IB-RAR: Information Bottleneck as Regularizer for Adversarial Robustness

Xiaoyun Xu,Guilherme Perin,Stjepan Picek
DOI: https://doi.org/10.48550/arXiv.2302.10896
2023-05-31
Abstract:In this paper, we propose a novel method, IB-RAR, which uses Information Bottleneck (IB) to strengthen adversarial robustness for both adversarial training and non-adversarial-trained methods. We first use the IB theory to build regularizers as learning objectives in the loss function. Then, we filter out unnecessary features of intermediate representation according to their mutual information (MI) with labels, as the network trained with IB provides easily distinguishable MI for its features. Experimental results show that our method can be naturally combined with adversarial training and provides consistently better accuracy on new adversarial examples. Our method improves the accuracy by an average of 3.07% against five adversarial attacks for the VGG16 network, trained with three adversarial training benchmarks and the CIFAR-10 dataset. In addition, our method also provides good robustness for undefended methods, such as training with cross-entropy loss only. Finally, in the absence of adversarial training, the VGG16 network trained using our method and the CIFAR-10 dataset reaches an accuracy of 35.86% against PGD examples, while using all layers reaches 25.61% accuracy.
Machine Learning,Artificial Intelligence,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the robustness of deep - learning networks against adversarial attacks. Specifically, the author proposes a new method - IB - RAR (Information Bottleneck as Regularizer for Adversarial Robustness), which uses the Information Bottleneck (IB) theory as a regularizer to enhance the adversarial robustness of adversarial training and non - adversarial training methods. The main contributions of the paper are as follows: 1. **Applying IB as a regularizer**: By embedding Mutual Information (MI) in the loss function, IB - RAR aims to reduce the correlation between the input and intermediate features, while increasing the correlation between intermediate features and labels, thereby improving the generalization ability and adversarial robustness of the model. 2. **Selecting robust layers**: The author finds that not all hidden layers have the same effect on improving adversarial robustness. Through experiments, they determine which layers (called robust layers) are most effective in improving adversarial robustness, and apply the IB regularizer only on these layers. 3. **Removing unnecessary features**: During the training process, IB - RAR also removes those uncorrelated or noisy feature channels by calculating the mutual information between each feature channel and the label, further optimizing the performance of the model. 4. **Combining with adversarial training**: IB - RAR can be naturally combined with existing adversarial training methods (such as PGD, TRADES, MART), significantly improving the accuracy of the model on adversarial samples. The experimental results show that IB - RAR not only improves the performance of the model on adversarial samples, but also can provide good robustness without adversarial training. For example, using the VGG16 network on the CIFAR - 10 dataset, the accuracy of IB - RAR under PGD attack reaches 35.86%, while the accuracy of using all layers is 25.61%. In general, this paper provides a new method to enhance the adversarial robustness of deep - learning models by introducing the IB theory, especially further improving the performance of the model on the basis of adversarial training.