Abstract:Deep neural networks (DNNs) have been demonstrated to be vulnerable to well-crafted \emph{adversarial examples}, which are generated through either well-conceived $\mathcal{L}_p$-norm restricted or unrestricted attacks. Nevertheless, the majority of those approaches assume that adversaries can modify any features as they wish, and neglect the causal generating process of the data, which is unreasonable and unpractical. For instance, a modification in income would inevitably impact features like the debt-to-income ratio within a banking system. By considering the underappreciated causal generating process, first, we pinpoint the source of the vulnerability of DNNs via the lens of causality, then give theoretical results to answer \emph{where to attack}. Second, considering the consequences of the attack interventions on the current state of the examples to generate more realistic adversarial examples, we propose CADE, a framework that can generate \textbf{C}ounterfactual \textbf{AD}versarial \textbf{E}xamples to answer \emph{how to attack}. The empirical results demonstrate CADE's effectiveness, as evidenced by its competitive performance across diverse attack scenarios, including white-box, transfer-based, and random intervention attacks.

What problem does this paper attempt to address?

The paper primarily focuses on the vulnerability of Deep Neural Networks (DNNs) when faced with carefully crafted adversarial examples and proposes a new method to generate such adversarial examples. Specifically, the paper aims to address the following two core issues: 1. **Understanding DNN Vulnerability**: Understanding why DNNs are susceptible to adversarial examples from a causal inference perspective. This includes analyzing the non-robustness of DNNs to interventional data rather than observational data and providing a theoretical explanation. 2. **Generating More Realistic Adversarial Examples**: Proposing a framework called CADE (Counterfactual ADversarial Examples) to generate more realistic adversarial examples. This process not only considers which features can be modified (i.e., where to attack) but also how to reasonably modify these features to produce practically feasible adversarial examples (i.e., how to attack). ### Summary of Contributions - **Theoretical Analysis**: The authors provide a theoretical representation of the non-robustness of DNNs to interventional data from a causal perspective, which guides the selection of effective attack targets. - **CADE Framework**: A method to generate counterfactual adversarial examples is proposed, considering the interactions between features, making the generated adversarial examples more realistic and credible. - **Experimental Validation**: A series of experiments demonstrate the effectiveness of CADE, including competitive performance in white-box attacks, transfer-based black-box attacks, and random intervention attack scenarios. ### Key Points of the Solution - **Choosing Attack Targets (Where to Attack)**: Identifying which variables' modifications are most likely to lead to model prediction errors through causal models, thereby determining effective attack targets. - **Executing the Attack (How to Attack)**: Using the causal generation process to predict the consequences of variable interventions, thereby generating more reasonable adversarial examples. This process follows the "abduction, action, prediction" three-step method in causal reasoning. ### Experimental Results - Experiments on different datasets show that CADE can effectively generate adversarial examples, especially excelling in white-box and transfer-based black-box attack scenarios. - For image data, CADE generates adversarial examples by manipulating latent representations rather than directly modifying pixel values, which also showed good results in experiments. In summary, by introducing a causal perspective, this paper not only deepens our understanding of DNN vulnerability but also proposes a more practical method to generate adversarial examples, which is significant for improving the security and robustness of models.

Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual Adversarial Examples

Attack As Defense: Characterizing Adversarial Examples Using Robustness.

Adversarial Robustness Through the Lens of Causality.

Adversarial Robustness through the Lens of Causality

Adversarial Examples: Attacks and Defenses for Deep Learning

NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks

DCVAE-adv: A Universal Adversarial Example Generation Method for White and Black Box Attacks

A CMA-ES-Based Adversarial Attack on Black-Box Deep Neural Networks

Adversarial Example Games

Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression

Towards Imperceptible and Robust Adversarial Example Attacks Against Neural Networks

Generate Adversarial Examples by Spatially Perturbing on the Meaningful Area

Adversarial Attack? Don't Panic

A Data-free Black-box Attack for Generating Transferable Adversarial Examples.

MC-Net: Realistic Sample Generation for Black-Box Attacks

A New Kind of Adversarial Example

How Can We Deal with Adversarial Examples?

Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples

AdverseGen: A Practical Tool for Generating Adversarial Examples to Deep Neural Networks Using Black-Box Approaches

ADSAttack: an Adversarial Attack Algorithm Via Searching Adversarial Distribution in Latent Space