Abstract:Saliency detection is an effective front-end process to many security-related tasks, e.g. automatic drive and tracking. Adversarial attack serves as an efficient surrogate to evaluate the robustness of deep saliency models before they are deployed in real world. However, most of current adversarial attacks exploit the gradients spanning the entire image space to craft adversarial examples, ignoring the fact that natural images are high-dimensional and spatially over-redundant, thus causing expensive attack cost and poor perceptibility. To circumvent these issues, this paper builds an efficient bridge between the accessible partially-white-box source models and the unknown black-box target models. The proposed method includes two steps: 1) We design a new partially-white-box attack, which defines the cost function in the compact hidden space to punish a fraction of feature activations corresponding to the salient regions, instead of punishing every pixel spanning the entire dense output space. This partially-white-box attack reduces the redundancy of the adversarial perturbation. 2) We exploit the non-redundant perturbations from some source models as the prior cues, and use an iterative zeroth-order optimizer to compute the directional derivatives along the non-redundant prior directions, in order to estimate the actual gradient of the black-box target model. The non-redundant priors boost the update of some "critical" pixels locating at non-zero coordinates of the prior cues, while keeping other redundant pixels locating at the zero coordinates unaffected. Our method achieves the best tradeoff between attack ability and perturbation redundancy. Finally, we conduct a comprehensive experiment to test the robustness of 18 state-of-the-art deep saliency models against 16 malicious attacks, under both of white-box and black-box settings, which contributes a new robustness benchmark to the saliency community for the first time.

Adaptive Gradient-based Word Saliency for Adversarial Text Attacks

Generating Natural Language Adversarial Examples Through Probability Weighted Word Saliency

A Modified Word Saliency-Based Adversarial Attack on Text Classification Models

Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial Attack Framework

HyGloadAttack: Hard-label black-box textual adversarial attacks via hybrid optimization

Textual Adversarial Attack As Combinatorial Optimization

Query-Efficient Adversarial Attack with Low Perturbation Against End-to-End Speech Recognition Systems

Word-level Textual Adversarial Attacking as Combinatorial Optimization

Semantic-Preserving Adversarial Text Attacks

Generation-based Parallel Particle Swarm Optimization for Adversarial Text Attacks

TextTricker: Loss-based and gradient-based adversarial attacks on text classification models

Saliency Attention and Semantic Similarity-Driven Adversarial Perturbation

Bigram and Unigram Based Text Attack Via Adaptive Monotonic Heuristic Search

TextCheater: A Query-Efficient Textual Adversarial Attack in the Hard-Label Setting

Improving Query Efficiency of Black-Box Attacks via the Preference of Deep Learning Models

TextHacker: Learning based Hybrid Local Search Algorithm for Text Hard-label Adversarial Attack

Adversarial Attack Against Deep Saliency Models Powered by Non-Redundant Priors

Towards Query-Efficient Adversarial Attacks Against Automatic Speech Recognition Systems

WordIllusion: An Adversarial Text Generation Algorithm Based on Human Cognitive System

Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations

Rethinking Targeted Adversarial Attacks For Neural Machine Translation