Attention Masks Help Adversarial Attacks to Bypass Safety Detectors

Yunfan Shi

2024-11-07

Abstract:Despite recent research advancements in adversarial attack methods, current approaches against XAI monitors are still discoverable and slower. In this paper, we present an adaptive framework for attention mask generation to enable stealthy, explainable and efficient PGD image classification adversarial attack under XAI monitors. Specifically, we utilize mutation XAI mixture and multitask self-supervised X-UNet for attention mask generation to guide PGD attack. Experiments on MNIST (MLP), CIFAR-10 (AlexNet) have shown that our system can outperform benchmark PGD, Sparsefool and SOTA SINIFGSM in balancing among stealth, efficiency and explainability which is crucial for effectively fooling SOTA defense protected classifiers.

Cryptography and Security,Artificial Intelligence

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is that current adversarial attack methods still have problems of being not covert enough, having low efficiency and being difficult to interpret when facing Explainable Artificial Intelligence (XAI) monitoring. Specifically: 1. **Lack of covertness**: Existing adversarial attack methods (such as PGD) are easily detected when passing through XAI monitoring. 2. **Low efficiency**: These methods are slow in generating adversarial samples. 3. **Lack of interpretability**: It is difficult to explain the working principles behind the adversarial samples generated by existing methods. To solve these problems, the author proposes an adaptive framework that uses attention mask generation to achieve covert, efficient and interpretable adversarial attacks. Specific techniques include: - Using the multi - task self - supervised X - UNet model to generate attention masks to guide PGD attacks. - Combining the mutant XAI hybrid algorithm and multi - task self - supervised learning to improve the covertness and efficiency of attacks. - Generating partial attention masks through methods such as Integrated Gradient and Layer - wise Relevance Propagation (LRP) to guide PGD attacks or train X - UNet as partial labels. Experimental results show that this method can significantly improve the covertness, efficiency and interpretability of adversarial attacks on the MNIST and CIFAR - 10 datasets, and is superior to existing methods such as the benchmark PGD, Sparsefool and SINIFGSM. ### Formula summary 1. **Loss function**: \[ \text{Loss}=\lambda_1\cdot L_1(\text{advm}, \text{data})+\lambda_2\cdot L_1(\text{mask}, \text{mix})+\lambda_2\cdot\delta_{\text{acc}} \] where $\lambda_1, \lambda_2, \lambda_3$ are weight parameters, $L_1$ represents the L1 loss, and $\delta_{\text{acc}}$ represents the change in accuracy. 2. **Activation function**: \[ \text{SLU}(x, a = 0.5)=\max(0, x)+a\cdot\sin(x) \] 3. **Convolution weight initialization**: \[ \text{Normal distribution with variance}=\sigma^2 \] Through these improvements, the author has successfully improved the performance of adversarial attacks in terms of covertness, efficiency and interpretability, thus more effectively bypassing existing XAI monitoring systems.

Attention Masks Help Adversarial Attacks to Bypass Safety Detectors

Adversarial Attacks and Mitigation for Anomaly Detectors of Cyber-Physical Systems

NetGuard: Protecting Commercial Web APIs from Model Inversion Attacks Using GAN-generated Fake Samples

ATTEQ-NN: Attention-based QoE-aware Evasive Backdoor Attacks.

A Universal Defense Strategy Against Adversarial Attacks Based on Attention-Guided

Take Fake as Real: Realistic-like Robust Black-box Adversarial Attack to Evade AIGC Detection

Misleading attention and classification: An adversarial attack to fool object detection models in the real world

Imperceptible Adversarial Attack with Multi-granular Spatio-temporal Attention for Video Action Recognition

Attention, Please! Adversarial Defense via Activation Rectification and Preservation

Dual Attention Suppression Attack: Generate Adversarial Camouflage in Physical World

Multiclass ASMA vs Targeted PGD Attack in Image Segmentation

Exploiting vulnerabilities of deep neural networks for privacy protection

Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models

MMAD-Purify: A Precision-Optimized Framework for Efficient and Scalable Multi-Modal Attacks

Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks

ADS-detector: An attention-based dual stream adversarial example detection method

Imperceptible Face Forgery Attack via Adversarial Semantic Mask

Invisible Adversarial Attack Against Deep Neural Networks: an Adaptive Penalization Approach

Stealthy Multi-Task Adversarial Attacks

Robust Superpixel-Guided Attentional Adversarial Attack

Designing defensive techniques to handle adversarial attack on deep learning based model