Abstract:Attackers can deliberately perturb classifiers' input with subtle noise, altering final predictions. Among proposed countermeasures, adversarial purification employs generative networks to preprocess input images, filtering out adversarial noise. In this study, we propose specific generators, defined Multiple Latent Variable Generative Models (MLVGMs), for adversarial purification. These models possess multiple latent variables that naturally disentangle coarse from fine features. Taking advantage of these properties, we autoencode images to maintain class-relevant information, while discarding and re-sampling any detail, including adversarial noise. The procedure is completely training-free, exploring the generalization abilities of pre-trained MLVGMs on the adversarial purification downstream task. Despite the lack of large models, trained on billions of samples, we show that smaller MLVGMs are already competitive with traditional methods, and can be used as foundation models. Official code released at <a class="link-external link-https" href="https://github.com/SerezD/gen_adversarial" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to address the threat of adversarial attacks to deep - learning models, especially image classifiers. Specifically, the authors propose a new adversarial purification framework based on Multiple Latent Variable Generative Models (MLVGMs) to effectively defend against these attacks. #### Background of Adversarial Attacks Adversarial attacks cause well - trained classifiers to make incorrect predictions by adding tiny, imperceptible perturbations to the input data. This attack method poses a serious threat to the security and reliability of deep - learning models, especially in practical applications such as smartphone image classification and traffic sign recognition scenarios. #### Deficiencies of Existing Defense Methods Existing defense methods include adversarial training and adversarial purification based on generative networks. However, these methods have some limitations: 1. **Adversarial Training**: Although it can improve the robustness of the model, it has a high computational cost and is prone to overfitting to specific attacks. 2. **Traditional Purification Methods**: They usually need to be specifically trained for tasks, increasing the computational overhead. #### The Proposed New Method To solve the above problems, the authors propose a novel auto - encoding purification framework that utilizes pre - trained MLVGMs to achieve adversarial purification. The main features of this method are as follows: - **No Additional Training Required**: Directly use pre - trained MLVGMs, avoiding additional training overhead. - **Advantages of Multiple Latent Variables**: MLVGMs have multiple latent variables and can naturally decouple global and local features, thus more effectively removing adversarial noise while retaining category - related information. - **Hyperparameter Optimization**: Determine the optimal interpolation coefficient \(\alpha_i\) through Bayesian Optimization or heuristic selection (such as linear or cosine functions) to balance the need to preserve the original information and remove noise. #### Specific Steps of the Method 1. **Encoding Stage**: Encode the input image into multiple latent variables \(z^e_0, z^e_1,\ldots, z^e_{N - 1}\), which contain class - related features and possible adversarial noise. 2. **Sampling Stage**: Sample new latent variables \(z^s_0, z^s_1,\ldots, z^s_{N - 1}\) from the prior distribution of the pre - trained generator, which do not contain adversarial information. 3. **Interpolation Stage**: Perform linear interpolation on each pair of latent variables \(z^e_i\) and \(z^s_i\): \[ z_i=(1 - \alpha_i)z^e_i+\alpha_i z^s_i \] where \(0\leq\alpha_i\leq1\), \(\alpha_i = 0\) means only using the encoded information, and \(\alpha_i = 1\) means only using the newly sampled information. 4. **Decoding Stage**: Use the generative model to decode the interpolated latent variables to obtain the purified image. #### Experimental Verification The authors conducted experiments on multiple datasets and classification tasks, including binary classification (male/female), fine - grained identity classification (100 categories), and car type classification (4 categories). The experimental results show that although the MLVGMs used are not basic models trained on billions of samples, they can still compete with specially designed techniques and do not require any additional training at all. In conclusion, this paper proposes a novel and efficient adversarial purification framework that utilizes pre - trained MLVGMs to successfully defend against adversarial attacks without the need for additional training, demonstrating its potential in practical applications.

Pre-trained Multiple Latent Variable Generative Models are good defenders against Adversarial Attacks

Language Guided Adversarial Purification

Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models

Generating Natural Language Adversarial Examples on a Large Scale with Generative Models

Adversarial Text Purification: A Large Language Model Approach for Defense

Synthesizing Unrestricted False Positive Adversarial Objects Using Generative Models

Evaluation of GAN-Based Model for Adversarial Training

Text Adversarial Purification As Defense Against Adversarial Attacks

Generating Adversarial Attacks in the Latent Space

How Robust Is a Large Pre-trained Language Model for Code Generationƒ A Case on Attacking GPT2

MMAD-Purify: A Precision-Optimized Framework for Efficient and Scalable Multi-Modal Attacks

Generative Adversarial Trainer: Defense to Adversarial Perturbations with GAN

Adversarial defenses via a mixture of generators

Online Alternate Generator against Adversarial Attacks

Man-in-the-Middle Attacks Against Machine Learning Classifiers Via Malicious Generative Models

A Direct Approach to Robust Deep Learning Using Adversarial Networks

Harnessing the Vulnerability of Latent Layers in Adversarially Trained Models

Adversarial purification with Score-based generative models

Adversarial Attacks Neutralization via Data Set Randomization

Guided Diffusion Model for Adversarial Purification

Adversarial Purification of Information Masking