Abstract:Our paper presents a novel defence against black box attacks, where attackers use the victim model as an oracle to craft their adversarial examples. Unlike traditional preprocessing defences that rely on sanitizing input samples, our stateless strategy counters the attack process itself. For every query we evaluate a counter-sample instead, where the counter-sample is the original sample optimized against the attacker's objective. By countering every black box query with a targeted white box optimization, our strategy effectively introduces an asymmetry to the game to the defender's advantage. This defence not only effectively misleads the attacker's search for an adversarial example, it also preserves the model's accuracy on legitimate inputs and is generic to multiple types of attacks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to defend against black - box adversarial attacks against deep - learning models. Specifically, the author proposes a novel stateless strategy, which effectively resists these attacks by generating counter - samples to mislead attackers. The following is a detailed interpretation of this problem: ### 1. **Problem Background** Deep neural networks are vulnerable to adversarial samples, which are deliberately designed by making small perturbations to the input in order to induce the model to produce misclassifications. The generation of adversarial samples can be achieved through the following optimization objective: \[ \delta^* = \arg \min_{\delta} \|\delta\|_p \quad \text{subject to} \quad f(x + \delta) \neq f(x) \quad \text{and} \quad \|\delta\|_p \leq \epsilon \] Here, \( f \) represents the attacked model, and the condition \( \|\delta\|_p \leq \epsilon \) ensures that the adversarial sample \( x' = x + \delta \) is almost indistinguishable from the original sample \( x \) visually. In white - box attacks, the attacker can access the internal parameters of the model and directly calculate the gradient to find \( \delta \). However, in many practical applications, the attacker can only obtain information by querying the model, which is called black - box attack. Black - box attacks usually rely on querying the model to estimate the loss and gradually optimize \( \delta \). ### 2. **Limitations of Existing Methods** Existing defense mechanisms are mainly divided into three categories: pre - processing, detection, and model - strengthening techniques. Although pre - processing methods do not prevent individual samples, they may reduce the performance of the model on normal inputs and have poor defense effects against adaptive attackers. ### 3. **Solution Proposed in the Paper** To solve the above problems, the author proposes a new pre - processing method - **counter - sample defense**. The main features of this method are as follows: - **Statelessness**: There is no need to track the user's historical queries, so it has good scalability. - **Asymmetry in Optimization Capabilities**: Utilize the difference in capabilities between the attacker and the defender. The attacker is limited to black - box queries, while the defender can perform multiple white - box optimizations. - **Preservation of Clean Task Performance**: It will not significantly affect the accuracy of the model on normal inputs. ### 4. **Specific Method** For each query \( x_t \), the defender generates a counter - sample \( x_t^* \) such that \( x_t^* \) is closer to its predicted category. This process can be achieved by gradient descent: \[ x_{t + 1}^* = x_t^* - \alpha \nabla_{x_t^*} L(f(x_t^*; \theta), \hat{y}) \] Here, \( \alpha \) is the learning rate, \( \nabla_{x_t^*} L \) is the gradient of the loss function \( L \) with respect to \( x_t^* \), and \( \hat{y} \) is the label predicted by the model. In this way, the defender can mislead the attacker in each attack iteration, making it difficult for the attacker to find effective adversarial samples. ### 5. **Experimental Results** The author evaluated this method on the CIFAR - 10 and ImageNet datasets, and the results show that it can effectively defend against a variety of state - of - the - art black - box attacks and is superior to other defense methods in maintaining the model's clean - task performance. In conclusion, this paper proposes an innovative stateless defense strategy.

Counter-Samples: A Stateless Strategy to Neutralize Black Box Adversarial Attacks

Towards Efficient Data Free Blackbox Adversarial Attack

Stateful Defenses for Machine Learning Models Are Not Yet Secure Against Black-box Attacks

Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence

BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

A Universal Defense Strategy Against Adversarial Attacks Based on Attention-Guided

Improving Query Efficiency of Black-box Adversarial Attack

Black-Box Evasion Attack Method Based on Confidence Score of Benign Samples

Beware the Black-Box: on the Robustness of Recent Defenses to Adversarial Examples

Data-free Defense of Black Box Models Against Adversarial Attacks

Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gradient Obfuscation Defenses

Adversarial trading

Query-Efficient Black-box Adversarial Examples (superceded)

Query-efficient Black-box Adversarial Attack with Customized Iteration and Sampling

Don't sweat the small stuff, classify the rest: Sample Shielding to protect text classifiers against adversarial attacks

Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples

Potential adversarial samples for white-box attacks

Functionality-preserving Black-box Optimization of Adversarial Windows Malware

Query-efficient label-only attacks against black-box machine learning models

Beating Backdoor Attack at Its Own Game