Abstract:Adversarial examples represent a serious issue for the application of machine learning models in many sensitive domains. For generating adversarial examples, decision based black-box attacks are one of the most practical techniques as they only require query access to the model. One of the most recently proposed state-of-the-art decision based black-box attacks is Triangle Attack (TA). In this paper, we offer a high-level description of TA and explain potential theoretical limitations. We then propose a new decision based black-box attack, Triangle Attack with Reinforcement Learning (TARL). Our new attack addresses the limits of TA by leveraging reinforcement learning. This creates an attack that can achieve similar, if not better, attack accuracy than TA with half as many queries on state-of-the-art classifiers and defenses across ImageNet and CIFAR-10.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to improve the existing decision - making black - box attack methods (especially Triangle Attack, abbreviated as TA) to reduce the number of queries and increase the attack success rate. Specifically, in view of the theoretical limitations of TA in generating adversarial samples, the paper proposes a new decision - making black - box attack method based on reinforcement learning - Triangle Attack with Reinforcement Learning (TARL). This method aims to address the shortcomings of TA in adjusting the angle parameter α, thereby achieving more efficient attacks. ### Main problems and solutions 1. **Limitations of existing methods**: - **Triangle Attack (TA)** is an efficient decision - making black - box attack method, but it has limitations in updating the angle parameter α. According to Proposition 1, a smaller angle α is more likely to find adversarial samples, while a larger angle α will result in a smaller perturbation. However, the update algorithm of TA cannot always guarantee to find the optimal solution, especially in some cases it may lead to attack failure. 2. **Proposal of a new method**: - **TARL** optimizes the update of the angle parameter α by introducing the Q - learning algorithm in reinforcement learning. Specifically, TARL uses historical data to train an agent so that it can adaptively adjust the α value according to different decision - boundary shapes. This enables TARL to find better adversarial samples with fewer queries. ### Experimental verification To verify the effectiveness of TARL, the author conducted the following experiments: - **Experimental setup**: Two datasets (ImageNet and CIFAR - 10), 10 victim models and 1 defense model (Diffusion model) were used. - **Experimental results**: Under a query budget of 500, TARL outperforms or achieves the same attack success rate as TA on most models, and even exceeds TA's performance on some models. In particular, when the RMSE is 0.05, the attack success rate of TARL is significantly increased. ### Summary The main contributions of the paper are: 1. Proposing the TARL method, which solves the limitations of TA in updating the angle parameter α. 2. Verifying through experiments that TARL can maintain or even increase the attack success rate while reducing the number of queries. 3. Expanding the experimental scope, including more models and datasets, further verifying the generalization ability and robustness of TARL. Through these improvements, TARL not only improves the efficiency of decision - making black - box attacks, but also provides new ideas for future research on adversarial attacks.

Theoretical Corrections and the Leveraging of Reinforcement Learning to Enhance Triangle Attack

Attack As Defense: Characterizing Adversarial Examples Using Robustness.

MARNet: Backdoor Attacks Against Cooperative Multi-Agent Reinforcement Learning

Deep-Attack over the Deep Reinforcement Learning

Attacking Adversarial Attacks as A Defense

Motivating the Rules of the Game for Adversarial Example Research

Trojan Activation Attack: Red-Teaming Large Language Models using Activation Steering for Safety-Alignment

Red Teaming with Mind Reading: White-Box Adversarial Policies Against RL Agents

Adversarial examples for models of code

Multiple-Model Based Defense for Deep Reinforcement Learning Against Adversarial Attack

Trojan Horse Training for Breaking Defenses against Backdoor Attacks in Deep Learning

CuDA2: An approach for Incorporating Traitor Agents into Cooperative Multi-Agent Systems

Adversarial Example Games

Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples

Push & Pull: Transferable Adversarial Examples With Attentive Attack

Understanding and Enhancing the Transferability of Adversarial Examples

Aha! Adaptive History-driven Attack for Decision-based Black-box Models

Adversarial Robustness of Deep Reinforcement Learning based Dynamic Recommender Systems

Luring of transferable adversarial perturbations in the black-box paradigm

Learning Defense Transformers for Counterattacking Adversarial Examples

Natural Black-Box Adversarial Examples Against Deep Reinforcement Learning.