Boosting Imperceptibility of Stable Diffusion-based Adversarial Examples Generation with Momentum

Nashrah Haque,Xiang Li,Zhehui Chen,Yanzhao Wu,Lei Yu,Arun Iyengar,Wenqi Wei
2024-10-17
Abstract:We propose a novel framework, Stable Diffusion-based Momentum Integrated Adversarial Examples (SD-MIAE), for generating adversarial examples that can effectively mislead neural network classifiers while maintaining visual imperceptibility and preserving the semantic similarity to the original class label. Our method leverages the text-to-image generation capabilities of the Stable Diffusion model by manipulating token embeddings corresponding to the specified class in its latent space. These token embeddings guide the generation of adversarial images that maintain high visual fidelity. The SD-MIAE framework consists of two phases: (1) an initial adversarial optimization phase that modifies token embeddings to produce misclassified yet natural-looking images and (2) a momentum-based optimization phase that refines the adversarial perturbations. By introducing momentum, our approach stabilizes the optimization of perturbations across iterations, enhancing both the misclassification rate and visual fidelity of the generated adversarial examples. Experimental results demonstrate that SD-MIAE achieves a high misclassification rate of 79%, improving by 35% over the state-of-the-art method while preserving the imperceptibility of adversarial perturbations and the semantic similarity to the original class label, making it a practical method for robust adversarial evaluation.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to generate adversarial examples that can effectively mislead neural network classifiers while maintaining visual imperceptibility and semantic similarity to the original class labels. Specifically, the authors propose a new framework - Stable Diffusion - based Momentum Integrated Adversarial Examples (SD - MIAE) to address the following challenges: 1. **Visual Fidelity of Adversarial Examples**: Existing adversarial example generation methods are prone to produce unnatural artifacts when perturbing token embeddings in high - dimensional latent spaces, which will reduce the effectiveness of adversarial attacks and make the perturbations more easily detectable. 2. **Semantic Consistency**: The generated adversarial examples need to maintain semantic similarity to the original class labels to ensure their deceptiveness. To achieve these goals, the SD - MIAE framework consists of two stages: - **Initial Adversarial Optimization Stage**: Modify the token embeddings related to the specified class to generate images that look natural but will be misclassified. - **Momentum Optimization Stage**: Refine the adversarial perturbations by introducing a momentum mechanism, making the perturbations more stable during the iteration process, thereby increasing the misclassification rate and visual fidelity. Experimental results show that SD - MIAE achieves a high misclassification rate of 79%, which is 35% higher than the best existing method, and performs well in maintaining the imperceptibility and semantic similarity of adversarial perturbations. This makes SD - MIAE a practical method for evaluating the robustness of AI systems. ### Formula Summary 1. **Loss Function**: \[ \text{min} -\ell(F(G(z; e_{\text{text}})), y) + \lambda \cdot R(\hat{e}_{k}^{\text{token}}, e_{k}^{\text{token}}) \] where: - \( \ell(F(G(z; e_{\text{text}})), y) \) is the adversarial loss, aiming to cause the classifier to misclassify. - \( R(\hat{e}_{k}^{\text{token}}, e_{k}^{\text{token}}) \) is the cosine similarity regularization term, used to maintain the natural appearance of the image. 2. **Momentum Update Formula**: \[ m_{t + 1} = \mu \cdot m_t+\frac{\nabla_x \ell(F(x_t), y)}{\|\nabla_x \ell(F(x_t), y)\|_1} \] \[ x_{t + 1} = x_t+\alpha \cdot \text{sign}(m_{t + 1}) \] where: - \( m_t \) is the momentum term. - \( \alpha \) is the step size, set to \( \epsilon / T \) to satisfy the \( L_\infty \) constraint. - \( \epsilon \) is the perturbation size, and \( T \) is the total number of iteration steps. Through these improvements, SD - MIAE not only improves the generation effect of adversarial examples but also ensures that they are visually imperceptible and semantically consistent, thus providing a powerful tool for evaluating and enhancing the security of deep - learning models.