Abstract:We propose a novel framework, Stable Diffusion-based Momentum Integrated Adversarial Examples (SD-MIAE), for generating adversarial examples that can effectively mislead neural network classifiers while maintaining visual imperceptibility and preserving the semantic similarity to the original class label. Our method leverages the text-to-image generation capabilities of the Stable Diffusion model by manipulating token embeddings corresponding to the specified class in its latent space. These token embeddings guide the generation of adversarial images that maintain high visual fidelity. The SD-MIAE framework consists of two phases: (1) an initial adversarial optimization phase that modifies token embeddings to produce misclassified yet natural-looking images and (2) a momentum-based optimization phase that refines the adversarial perturbations. By introducing momentum, our approach stabilizes the optimization of perturbations across iterations, enhancing both the misclassification rate and visual fidelity of the generated adversarial examples. Experimental results demonstrate that SD-MIAE achieves a high misclassification rate of 79%, improving by 35% over the state-of-the-art method while preserving the imperceptibility of adversarial perturbations and the semantic similarity to the original class label, making it a practical method for robust adversarial evaluation.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to generate adversarial examples that can effectively mislead neural network classifiers while maintaining visual imperceptibility and semantic similarity to the original class labels. Specifically, the authors propose a new framework - Stable Diffusion - based Momentum Integrated Adversarial Examples (SD - MIAE) to address the following challenges: 1. **Visual Fidelity of Adversarial Examples**: Existing adversarial example generation methods are prone to produce unnatural artifacts when perturbing token embeddings in high - dimensional latent spaces, which will reduce the effectiveness of adversarial attacks and make the perturbations more easily detectable. 2. **Semantic Consistency**: The generated adversarial examples need to maintain semantic similarity to the original class labels to ensure their deceptiveness. To achieve these goals, the SD - MIAE framework consists of two stages: - **Initial Adversarial Optimization Stage**: Modify the token embeddings related to the specified class to generate images that look natural but will be misclassified. - **Momentum Optimization Stage**: Refine the adversarial perturbations by introducing a momentum mechanism, making the perturbations more stable during the iteration process, thereby increasing the misclassification rate and visual fidelity. Experimental results show that SD - MIAE achieves a high misclassification rate of 79%, which is 35% higher than the best existing method, and performs well in maintaining the imperceptibility and semantic similarity of adversarial perturbations. This makes SD - MIAE a practical method for evaluating the robustness of AI systems. ### Formula Summary 1. **Loss Function**: \[ \text{min} -\ell(F(G(z; e_{\text{text}})), y) + \lambda \cdot R(\hat{e}_{k}^{\text{token}}, e_{k}^{\text{token}}) \] where: - \( \ell(F(G(z; e_{\text{text}})), y) \) is the adversarial loss, aiming to cause the classifier to misclassify. - \( R(\hat{e}_{k}^{\text{token}}, e_{k}^{\text{token}}) \) is the cosine similarity regularization term, used to maintain the natural appearance of the image. 2. **Momentum Update Formula**: \[ m_{t + 1} = \mu \cdot m_t+\frac{\nabla_x \ell(F(x_t), y)}{\|\nabla_x \ell(F(x_t), y)\|_1} \] \[ x_{t + 1} = x_t+\alpha \cdot \text{sign}(m_{t + 1}) \] where: - \( m_t \) is the momentum term. - \( \alpha \) is the step size, set to \( \epsilon / T \) to satisfy the \( L_\infty \) constraint. - \( \epsilon \) is the perturbation size, and \( T \) is the total number of iteration steps. Through these improvements, SD - MIAE not only improves the generation effect of adversarial examples but also ensures that they are visually imperceptible and semantically consistent, thus providing a powerful tool for evaluating and enhancing the security of deep - learning models.

Boosting Imperceptibility of Stable Diffusion-based Adversarial Examples Generation with Momentum

Boosting Adversarial Attacks with Momentum

Efficient Adversarial Attack Based on Moment Estimation and Lookahead Gradient

SD-NAE: Generating Natural Adversarial Examples with Stable Diffusion

Boosting Adversarial Attacks with Nadam Optimizer

Enhancing Transferability of Adversarial Examples with Spatial Momentum

Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model

Discovering Adversarial Examples with Momentum.

Adaptive momentum variance for attention-guided sparse adversarial attacks

Demiguise Attack: Crafting Invisible Semantic Adversarial Perturbations with Perceptual Similarity

Efficient Generation of Targeted and Transferable Adversarial Examples for Vision-Language Models Via Diffusion Models

StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model

Toward Robust Imperceptible Perturbation against Unauthorized Text-to-image Diffusion-based Synthesis

Boosting the Transferability of Adversarial Attacks with Global Momentum Initialization

Imperceptible Adversarial Attack via Invertible Neural Networks

Research and Application of the Median Filtering Method in Enhancing the Imperceptibility of Perturbations in Adversarial Examples

MMAD-Purify: A Precision-Optimized Framework for Efficient and Scalable Multi-Modal Attacks

Improving Adversarial Transferability by Stable Diffusion

Transferable Adversarial Attack for Both Vision Transformers and Convolutional Networks Via Momentum Integrated Gradients

Adversarial example generation with adabelief optimizer and crop invariance