Abstract:Black-box adversarial attacks have demonstrated strong potential to compromise machine learning models by iteratively querying the target model or leveraging transferability from a local surrogate model. Recently, such attacks can be effectively mitigated by state-of-the-art (SOTA) defenses, e.g., detection via the pattern of sequential queries, or injecting noise into the model. To our best knowledge, we take the first step to study a new paradigm of black-box attacks with provable guarantees -- certifiable black-box attacks that can guarantee the attack success probability (ASP) of adversarial examples before querying over the target model. This new black-box attack unveils significant vulnerabilities of machine learning models, compared to traditional empirical black-box attacks, e.g., breaking strong SOTA defenses with provable confidence, constructing a space of (infinite) adversarial examples with high ASP, and the ASP of the generated adversarial examples is theoretically guaranteed without verification/queries over the target model. Specifically, we establish a novel theoretical foundation for ensuring the ASP of the black-box attack with randomized adversarial examples (AEs). Then, we propose several novel techniques to craft the randomized AEs while reducing the perturbation size for better imperceptibility. Finally, we have comprehensively evaluated the certifiable black-box attacks on the CIFAR10/100, ImageNet, and LibriSpeech datasets, while benchmarking with 16 SOTA black-box attacks, against various SOTA defenses in the domains of computer vision and speech recognition. Both theoretical and experimental results have validated the significance of the proposed attack. The code and all the benchmarks are available at \url{<a class="link-external link-https" href="https://github.com/datasec-lab/CertifiedAttack" rel="external noopener nofollow">this https URL</a>}.

Boosting Decision-Based Black-Box Adversarial Attacks with Random Sign Flip

Efficient Decision-Based Black-Box Adversarial Attacks on Face Recognition

Adaptive Perturbation for Adversarial Attack

Boosting Black-box Adversarial Attack with a Better Convergence

Perception-Driven Imperceptible Adversarial Attack Against Decision-Based Black-Box Models

AutoDA: Automated Decision-based Iterative Adversarial Attacks

Targeted Black-Box Adversarial Attack Method for Image Classification Models.

Improving Query Efficiency of Black-box Adversarial Attack

Decision-based Query Efficient Adversarial Attack Via Adaptive Boundary Learning

Boosting Decision-Based Black-Box Adversarial Attack with Gradient Priors

ADBA:Approximation Decision Boundary Approach for Black-Box Adversarial Attacks

Decision-BADGE: Decision-based Adversarial Batch Attack with Directional Gradient Estimation

Delving into Decision-based Black-box Attacks on Semantic Segmentation

An Approximated Gradient Sign Method Using Differential Evolution For Black-box Adversarial Attack

Towards Black-Box Adversarial Attacks on Interpretable Deep Learning Systems

One-bit Flip is All You Need: when Bit-flip Attack Meets Model Training

Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence

Switching Gradient Directions for Query-Efficient Black-Box Adversarial Attacks

Targeted Attack Against Deep Neural Networks Via Flipping Limited Weight Bits

Attacking Adversarial Attacks as A Defense