Abstract:In this paper, we propose a novel method, IB-RAR, which uses Information Bottleneck (IB) to strengthen adversarial robustness for both adversarial training and non-adversarial-trained methods. We first use the IB theory to build regularizers as learning objectives in the loss function. Then, we filter out unnecessary features of intermediate representation according to their mutual information (MI) with labels, as the network trained with IB provides easily distinguishable MI for its features. Experimental results show that our method can be naturally combined with adversarial training and provides consistently better accuracy on new adversarial examples. Our method improves the accuracy by an average of 3.07% against five adversarial attacks for the VGG16 network, trained with three adversarial training benchmarks and the CIFAR-10 dataset. In addition, our method also provides good robustness for undefended methods, such as training with cross-entropy loss only. Finally, in the absence of adversarial training, the VGG16 network trained using our method and the CIFAR-10 dataset reaches an accuracy of 35.86% against PGD examples, while using all layers reaches 25.61% accuracy.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the robustness of deep - learning networks against adversarial attacks. Specifically, the author proposes a new method - IB - RAR (Information Bottleneck as Regularizer for Adversarial Robustness), which uses the Information Bottleneck (IB) theory as a regularizer to enhance the adversarial robustness of adversarial training and non - adversarial training methods. The main contributions of the paper are as follows: 1. **Applying IB as a regularizer**: By embedding Mutual Information (MI) in the loss function, IB - RAR aims to reduce the correlation between the input and intermediate features, while increasing the correlation between intermediate features and labels, thereby improving the generalization ability and adversarial robustness of the model. 2. **Selecting robust layers**: The author finds that not all hidden layers have the same effect on improving adversarial robustness. Through experiments, they determine which layers (called robust layers) are most effective in improving adversarial robustness, and apply the IB regularizer only on these layers. 3. **Removing unnecessary features**: During the training process, IB - RAR also removes those uncorrelated or noisy feature channels by calculating the mutual information between each feature channel and the label, further optimizing the performance of the model. 4. **Combining with adversarial training**: IB - RAR can be naturally combined with existing adversarial training methods (such as PGD, TRADES, MART), significantly improving the accuracy of the model on adversarial samples. The experimental results show that IB - RAR not only improves the performance of the model on adversarial samples, but also can provide good robustness without adversarial training. For example, using the VGG16 network on the CIFAR - 10 dataset, the accuracy of IB - RAR under PGD attack reaches 35.86%, while the accuracy of using all layers is 25.61%. In general, this paper provides a new method to enhance the adversarial robustness of deep - learning models by introducing the IB theory, especially further improving the performance of the model on the basis of adversarial training.

IB-RAR: Information Bottleneck as Regularizer for Adversarial Robustness

GAAT: Group Adaptive Adversarial Training to Improve the Trade-Off Between Robustness and Accuracy

Singular Regularization with Information Bottleneck Improves Model's Adversarial Robustness

Enhancing Adversarial Transferability via Information Bottleneck Constraints

Learning Robust Variational Information Bottleneck with Reference

Improving the Adversarial Robustness of NLP Models by Information Bottleneck

Feature Augmentation for Adversarial Robustness

ROBY: Evaluating the adversarial robustness of a deep model by its decision boundaries

Improving adversarial robustness using knowledge distillation guided by attention information bottleneck

Improving Adversarial Robustness via Mutual Information Estimation

Tighter Bounds on the Information Bottleneck with Application to Deep Learning

Bayesian Learning with Information Gain Provably Bounds Risk for a Robust Adversarial Defense

Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off

Enhancing adversarial robustness with randomized interlayer processing

Robust Upper Bounds for Adversarial Training

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Toward Enhanced Robustness in Unsupervised Graph Representation Learning: A Graph Information Bottleneck Perspective

InfoBERT: Improving Robustness of Language Models from an Information Theoretic Perspective

Elastic Information Bottleneck

Transferring Adversarial Robustness Through Robust Representation Matching

Improved Adversarial Training Through Adaptive Instance-wise Loss Smoothing