Abstract: Deep convolutional neural network (DCNN for short) models are vulnerable to examples with small perturbations. Adversarial training (AT for short) is a widely used approach to enhance the robustness of DCNN models by data augmentation. In AT, the DCNN models are trained with clean examples and adversarial examples (AE for short) which are generated using a specific attack method, aiming to gain ability to defend themselves when facing the unseen AEs. However, in practice, the trained DCNN models are often fooled by the AEs generated by the novel attack methods. This naturally raises a question: can a DCNN model learn certain features which are insensitive to small perturbations, and further defend itself no matter what attack methods are presented. To answer this question, this paper makes a beginning effort by proposing a shallow binary feature module (SBFM for short), which can be integrated into any popular backbone. The SBFM includes two types of layers, i.e., Sobel layer and threshold layer. In Sobel layer, there are four parallel feature maps which represent horizontal, vertical, and diagonal edge features, respectively. And in threshold layer, it turns the edge features learnt by Sobel layer to the binary features, which then are feeded into the fully connected layers for classification with the features learnt by the backbone. We integrate SBFM into VGG16 and ResNet34, respectively, and conduct experiments on multiple datasets. Experimental results demonstrate, under FGSM attack with $\epsilon=8/255$, the SBFM integrated models can achieve averagely 35\% higher accuracy than the original ones, and in CIFAR-10 and TinyImageNet datasets, the SBFM integrated models can achieve averagely 75\% classification accuracy. The work in this paper shows it is promising to enhance the robustness of DCNN models through feature learning.

Flooding-X: Improving BERT's Resistance to Adversarial Attacks Via Loss-Restricted Fine-Tuning.

GAAT: Group Adaptive Adversarial Training to Improve the Trade-Off Between Robustness and Accuracy

Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning

RoChBert: Towards Robust BERT Fine-tuning for Chinese

BERT-ATTACK: Adversarial Attack Against BERT Using BERT

FreeLB: Enhanced Adversarial Training for Natural Language Understanding

Fast Adversarial Training against Textual Adversarial Attacks

TextAT: Adversarial Training for Natural Language Understanding with Token-Level Perturbation.

PlugAT: A Plug and Play Module to Defend against Textual Adversarial Attack

Efficient Adversarial Training with Robust Early-Bird Tickets

How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?

Adversarial Training for Improving Model Robustness? Look at Both Prediction and Interpretation

Improving the Robustness of Deep Convolutional Neural Networks Through Feature Learning

Feature Augmentation for Adversarial Robustness

AdaFlood: Adaptive Flood Regularization

Better Robustness by More Coverage: Adversarial and Mixup Data Augmentation for Robust Finetuning.

Better Robustness by More Coverage: Adversarial Training with Mixup Augmentation for Robust Fine-tuning

Better Robustness by More Coverage: Adversarial Training with Mixup Augmentation for Robust Fine-tuning

Token-Aware Virtual Adversarial Training in Natural Language Understanding.

Improving the Robustness of Transformer-based Large Language Models with Dynamic Attention

Improving Gradient-based Adversarial Training for Text Classification by Contrastive Learning and Auto-Encoder.