DifFilter: Defending Against Adversarial Perturbations with Diffusion Filter

Yong Chen,Xuedong Li,Peng Hu,Dezhong Peng,Xu Wang
DOI: https://doi.org/10.1109/tifs.2024.3422923
IF: 7.231
2024-01-01
IEEE Transactions on Information Forensics and Security
Abstract:The inherent vulnerability of deep learning to adversarial examples poses a significant security challenge. Although existing defense methods have partially mitigated the harm caused by adversarial attacks, they are still unable to meet practical needs due to their high cost, high latency, and poor defense performance. In this paper, we propose an advanced plug-and-play adversarial purification model called DifFilter. Specifically, we use the superior generative properties of diffusion models to denoise adversarial perturbations and recover clean images. To make Gaussian noise disrupt adversarial perturbations while preserving the real semantic information in the input image, we extend forward diffusion to an infinite number of noise scales so that the distribution of perturbation data evolves with increasing noise according to stochastic differential equations. In the inverse denoising process, we develop a score-based model learning method to restore the input prior distribution to the data distribution of the original clean sample, resulting in stronger purification effects. Additionally, we propose an efficient sampling method to accelerate the computation speed of inverse process, greatly reducing the time cost of purification. We conduct extensive experiments to evaluate the defense generalization performance of DifFilter. The results demonstrate that our method not only surpasses existing defense methods in defense robustness under strong adaptive and black-box attacks but also achieves higher certificate accuracy than the baseline. Furthermore, DifFilter can be combined with adversarial training to further improve defense robustness.
What problem does this paper attempt to address?