Abstract:Despite ongoing efforts to defend neural classifiers from adversarial attacks, they remain vulnerable, especially to unseen attacks. In contrast, humans are difficult to be cheated by subtle manipulations, since we make judgments only based on essential factors. Inspired by this observation, we attempt to model label generation with essential label-causative factors and incorporate label-non-causative factors to assist data generation. For an adversarial example, we aim to discriminate the perturbations as non-causative factors and make predictions only based on the label-causative factors. Concretely, we propose a casual diffusion model (CausalDiff) that adapts diffusion models for conditional data generation and disentangles the two types of casual factors by learning towards a novel casual information bottleneck objective. Empirically, CausalDiff has significantly outperformed state-of-the-art defense methods on various unseen attacks, achieving an average robustness of 86.39% (+4.01%) on CIFAR-10, 56.25% (+3.13%) on CIFAR-100, and 82.62% (+4.93%) on GTSRB (German Traffic Sign Recognition Benchmark).
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the vulnerability of neural classifiers in the face of adversarial attacks, especially for unseen attack types. Although existing defense methods (such as certified defense, adversarial training, and adversarial purification) have achieved certain effects, they still have significant limitations when dealing with unknown attacks. Specifically:
1. **Certified Defense**: Due to the small theoretically certifiable robustness region, its practicality is limited.
2. **Adversarial Training**: Although adding adversarial samples to the training set can improve the model's robustness to specific attacks, its effectiveness against unseen attacks will decrease significantly.
3. **Adversarial Purification**: These methods are not designed for specific attacks, and it is difficult to determine the optimal denoising level, so they are not effective when facing unknown attacks of different intensities.
To solve the above problems, the paper proposes a new causal diffusion framework (CausalDiff), aiming to enhance the model's robustness to various unseen attacks by modeling the essential factors (Y - causative factors) and non - essential factors (Y - non - causative factors) of category generation. Specifically, CausalDiff attempts to separate non - causal factors from adversarial samples and make predictions only based on causal factors, thereby improving the model's robustness and generalization ability.
### Main Contributions
1. **Novel Causal Diffusion Framework (CausalDiff)**: By modeling the generation process of data in the native domain, distinguish label - causal factors from other non - causal factors to enhance robustness against unknown attacks.
2. **Causal Information Bottleneck (CIB) Objective**: Propose an optimization objective for separating Y - causative and Y - non - causative factors during the training of the causal model and provide the corresponding inference algorithm.
3. **Significantly Outperform Existing Methods**: Experimental results show that CausalDiff performs excellently under various unseen attacks and significantly outperforms the existing state - of - the - art methods.
### Core Idea of the Solution
Inspired by the human decision - making process, the paper believes that humans will only base their judgments on key features and ignore other irrelevant factors. Therefore, the author proposes to learn a model that can distinguish the essential features that determine the category from other non - essential features. Through the combination of structural causal models (SCM) and diffusion models, CausalDiff can effectively remove the influence of adversarial perturbations during the inference stage, thereby making more robust predictions.
### Experimental Verification
Experimental results show that CausalDiff has a significant improvement in robustness against various unseen attacks on multiple benchmark datasets (such as CIFAR - 10, CIFAR - 100, and GTSRB). For example, the average robustness on CIFAR - 10 reaches 86.39%, which is 4.01% higher than existing methods; similar improvements have also been achieved on CIFAR - 100.
In summary, the main goal of this paper is to solve the vulnerability problem of existing defense methods in the face of unknown attacks by introducing the causal diffusion model, thereby improving the overall robustness and security of neural classifiers.