Abstract:We study the unique, less-well understood problem of generating sparse adversarial samples simply by observing the score-based replies to model queries. Sparse attacks aim to discover a minimum number-the l0 bounded-perturbations to model inputs to craft adversarial examples and misguide model decisions. But, in contrast to query-based dense attack counterparts against black-box models, constructing sparse adversarial perturbations, even when models serve confidence score information to queries in a score-based setting, is non-trivial. Because, such an attack leads to i) an NP-hard problem; and ii) a non-differentiable search space. We develop the BruSLeAttack-a new, faster (more query-efficient) Bayesian algorithm for the problem. We conduct extensive attack evaluations including an attack demonstration against a Machine Learning as a Service (MLaaS) offering exemplified by Google Cloud Vision and robustness testing of adversarial training regimes and a recent defense against black-box attacks. The proposed attack scales to achieve state-of-the-art attack success rates and query efficiency on standard computer vision tasks such as ImageNet across different model architectures. Our artefacts and DIY attack samples are available on GitHub. Importantly, our work facilitates faster evaluation of model vulnerabilities and raises our vigilance on the safety, security and reliability of deployed systems.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to generate sparse adversarial examples by observing the confidence score responses of the model to queries in a black - box query environment. Specifically, the paper focuses on how to generate adversarial examples with the least pixel perturbation (i.e., perturbation under \(l_0\) constraint) given an input image, so that these examples can mislead machine - learning models to make wrong decisions. The challenges of this problem are as follows: 1. **NP - hard problem**: Minimizing the \(l_0\) norm is an NP - hard problem, which means that it is very difficult to find the optimal solution in high - dimensional space. 2. **Non - differentiable search space**: Due to the properties of the \(l_0\) norm, the search space is non - continuous and non - differentiable, which makes it difficult for traditional optimization methods to be directly applied. 3. **Mixed search space**: When generating sparse adversarial examples, it is necessary to determine which pixels need to be perturbed and the specific color values of these pixels at the same time, which leads to the complexity of the search space. To address these problems, the paper proposes a new algorithm - BRUSLEATTACK, which is a fast (more query - efficient) sparse adversarial attack algorithm based on the Bayesian framework. BRUSLEATTACK solves the above problems through the following methods: - **Reducing the dimension of the search space**: By introducing a synthetic color image \(x'\), the color value of each pixel is limited to a discrete space, thereby significantly reducing the dimension of the search space. - **Bayesian framework**: Use the information of historical pixel operations to learn the influence of each pixel and guide the selection of new pixels accordingly, so as to more effectively find sparse adversarial examples. - **Pixel dissimilarity map**: By calculating the pixel dissimilarity between the source image and the synthetic color image, the process of moving the adversarial example towards the decision boundary is accelerated. The paper verifies the effectiveness of BRUSLEATTACK through experiments on multiple datasets (such as CIFAR - 10, STL - 10, and ImageNet). The experimental results show that BRUSLEATTACK can significantly improve the attack success rate (ASR) under different query budgets and sparsity levels, and in terms of adversarial sparse attacks, Vision Transformer (ViT) shows higher robustness than convolutional neural networks (CNN).

BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack

Towards Efficient Data Free Blackbox Adversarial Attack

Improving Query Efficiency of Black-box Adversarial Attack

Query Efficient Decision Based Sparse Attacks Against Black-Box Deep Learning Models

Simple Black-box Adversarial Attacks

Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Query Attacks

Subspace Attack: Exploiting Promising Subspaces for Query-Efficient Black-box Attacks.

Black-box Bayesian Adversarial Attack with Transferable Priors

Sparse Attack with Meta-Learning

Sparse Black-Box Multimodal Attack for Vision-Language Adversary Generation

Sparse and Imperceivable Adversarial Attacks

Black-box Adversarial Attacks with Limited Queries and Information

BlackboxBench: A Comprehensive Benchmark of Black-box Adversarial Attacks

Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence

Square Attack: A Query-Efficient Black-Box Adversarial Attack via Random Search

Microbial Genetic Algorithm-based Black-box Attack against Interpretable Deep Learning Systems

Local Black-box Adversarial Attacks: A Query Efficient Approach

Reinforcement Learning Based Sparse Black-box Adversarial Attack on Video Recognition Models

Projection Probability-Driven Black-Box Attack

A CMA-ES-Based Adversarial Attack on Black-Box Deep Neural Networks

QEBA: Query-Efficient Boundary-Based Blackbox Attack