Unifying Bilateral Filtering and Adversarial Training for Robust Neural Networks

Neale Ratzlaff,Li Fuxin
DOI: https://doi.org/10.48550/arXiv.1804.01635
2018-10-09
Abstract:Recent analysis of deep neural networks has revealed their vulnerability to carefully structured adversarial examples. Many effective algorithms exist to craft these adversarial examples, but performant defenses seem to be far away. In this work, we explore the use of edge-aware bilateral filtering as a projection back to the space of natural images. We show that bilateral filtering is an effective defense in multiple attack settings, where the strength of the adversary gradually increases. In the case of an adversary who has no knowledge of the defense, bilateral filtering can remove more than 90% of adversarial examples from a variety of different attacks. To evaluate against an adversary with complete knowledge of our defense, we adapt the bilateral filter as a trainable layer in a neural network and show that adding this layer makes ImageNet images significantly more robust to attacks. When trained under a framework of adversarial training, we show that the resulting model is hard to fool with even the best attack methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the vulnerability of deep neural networks to carefully crafted adversarial examples. Specifically, the author explores the use of edge - aware bilateral filtering as a method to project the input back to the natural image space in order to defend against adversarial attacks. The paper mainly studies the effectiveness of bilateral filters in different attack scenarios and proposes a new model, BFNet, which further improves the model's robustness against adversarial attacks by integrating the bilateral filter as a trainable layer into the neural network. ### Main Contributions 1. **Bilateral Filter as a Defense Mechanism**: - The paper shows that the bilateral filter can effectively remove adversarial examples in multiple attack settings. In particular, when the attacker has no knowledge of the defense strategy, the bilateral filter can remove more than 90% of the adversarial examples. - By adjusting the parameters of the bilateral filter, the image damaged by the adversarial attack can be restored to some extent, making it return to the original classification. 2. **BFNet Model**: - BFNet is introduced, which is an end - to - end model that integrates the bilateral filter as a differentiable layer into the neural network. - Experiments show that BFNet exhibits strong robustness in the face of multiple white - box attacks. In particular, on the ImageNet dataset, the defense effect against L∞ and L2 attacks is significant. 3. **Combined with Adversarial Training**: - Combining the bilateral filter with adversarial training further improves the model's robustness. - The experimental results on the MNIST and CIFAR - 10 datasets show that the adversarial training method combined with the bilateral filter can achieve state - of - the - art performance. ### Method Overview 1. **Bilateral Filter**: - The bilateral filter is a non - linear Gaussian filter used to smooth the image gradient while preserving sharp edges. - The formula is as follows: \[ I_{\text{filtered}}(p)=\frac{1}{W_p}\sum_{q\in\Omega}G_s(\|p - q\|)G_r(\|I_p - I_q\|)I_q \] where: \[ W_p=\sum_{q\in\Omega}G_s(\|p - q\|)G_r(\|I_p - I_q\|) \] \[ G_s(x)=\exp\left(-\frac{x^2}{2\sigma_s^2}\right)\quad\text{and}\quad G_r(x)=\exp\left(-\frac{x^2}{2\sigma_r^2}\right) \] 2. **BFNet Model**: - BFNet improves the model's robustness by applying the bilateral filter on the input image and then inputting it into the CNN. - The Permutohedral Lattice is used to implement the bilateral filter, reducing its computational complexity from O(n^2) to O(n) and maintaining differentiability. 3. **Adversarial Training**: - Combined with adversarial training, adversarial examples are generated and added to the training set to enhance the model's robustness. - The experimental results show that BFNet performs excellently in the face of multiple strong adversarial attacks, especially on the MNIST and CIFAR - 10 datasets. ### Experimental Results - **Adaptive Filtering Model**: - By training a small network to predict the parameters of the bilateral filter, experiments prove that this model can effectively restore the image damaged by the adversarial attack. - **BFNet's Performance on ImageNet**: - BFNet shows significant robustness in the face of L2 and L∞ attacks, and the generated adversarial examples need larger perturbations to deceive the model. - **Adversarial Training on MNIST and CIFAR - 10**: - The adversarial training method combined with the bilateral filter has achieved state - of - the - art performance on the MNIST and CIFAR - 10 datasets. In general, this paper, by introducing bilateral filtering...