Nicole Meng,Caleb Manicke,David Chen,Yingjie Lao,Caiwen Ding,Pengyu Hong,Kaleel Mahmood
Abstract:Adversarial examples represent a serious issue for the application of machine learning models in many sensitive domains. For generating adversarial examples, decision based black-box attacks are one of the most practical techniques as they only require query access to the model. One of the most recently proposed state-of-the-art decision based black-box attacks is Triangle Attack (TA). In this paper, we offer a high-level description of TA and explain potential theoretical limitations. We then propose a new decision based black-box attack, Triangle Attack with Reinforcement Learning (TARL). Our new attack addresses the limits of TA by leveraging reinforcement learning. This creates an attack that can achieve similar, if not better, attack accuracy than TA with half as many queries on state-of-the-art classifiers and defenses across ImageNet and CIFAR-10.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to improve the existing decision - making black - box attack methods (especially Triangle Attack, abbreviated as TA) to reduce the number of queries and increase the attack success rate. Specifically, in view of the theoretical limitations of TA in generating adversarial samples, the paper proposes a new decision - making black - box attack method based on reinforcement learning - Triangle Attack with Reinforcement Learning (TARL). This method aims to address the shortcomings of TA in adjusting the angle parameter α, thereby achieving more efficient attacks.
### Main problems and solutions
1. **Limitations of existing methods**:
- **Triangle Attack (TA)** is an efficient decision - making black - box attack method, but it has limitations in updating the angle parameter α. According to Proposition 1, a smaller angle α is more likely to find adversarial samples, while a larger angle α will result in a smaller perturbation. However, the update algorithm of TA cannot always guarantee to find the optimal solution, especially in some cases it may lead to attack failure.
2. **Proposal of a new method**:
- **TARL** optimizes the update of the angle parameter α by introducing the Q - learning algorithm in reinforcement learning. Specifically, TARL uses historical data to train an agent so that it can adaptively adjust the α value according to different decision - boundary shapes. This enables TARL to find better adversarial samples with fewer queries.
### Experimental verification
To verify the effectiveness of TARL, the author conducted the following experiments:
- **Experimental setup**: Two datasets (ImageNet and CIFAR - 10), 10 victim models and 1 defense model (Diffusion model) were used.
- **Experimental results**: Under a query budget of 500, TARL outperforms or achieves the same attack success rate as TA on most models, and even exceeds TA's performance on some models. In particular, when the RMSE is 0.05, the attack success rate of TARL is significantly increased.
### Summary
The main contributions of the paper are:
1. Proposing the TARL method, which solves the limitations of TA in updating the angle parameter α.
2. Verifying through experiments that TARL can maintain or even increase the attack success rate while reducing the number of queries.
3. Expanding the experimental scope, including more models and datasets, further verifying the generalization ability and robustness of TARL.
Through these improvements, TARL not only improves the efficiency of decision - making black - box attacks, but also provides new ideas for future research on adversarial attacks.