Abstract:Deep neural networks that play a pivotal role in fields such as images, text, and audio are vulnerable to adversarial attacks. In current textual adversarial attacks, the vast majority are configured with a black-box soft-label which is achieved by the gradient information or confidence of the model. Therefore, it becomes challenging and realistic to implement adversarial attacks using only the predicted top labels of the hard-label model. Existing methods to implement hard-label adversarial attacks use population-based genetic optimization algorithms. However, this approach requires significant query consumption, which is a considerable shortcoming. To solve this problem, we propose a new textual black-box hard-label adversarial attack algorithm based on the idea of differential evolution of populations, called the text-based differential evolution (TDE) algorithm. First, the method will judge the importance of the words of the initial rough adversarial examples, according to which only the keywords in the text sentence will be operated, and the rest of the words will be gradually replaced with the original words so as to reduce the words in the sentence in which the replacement occurs. Our method judges the quality of semantic similarity of the adversarial examples in the replacement process and deposits high-quality adversarial example individuals into the population. Secondly, the optimization process of adversarial examples is combined and optimized according to the word importance. Compared with existing methods based on genetic algorithm guidance, our method avoids a large number of meaningless repetitive queries and significantly improves the overall attack efficiency of the algorithm and the semantic quality of the generated adversarial examples. We experimented with multiple datasets on three text tasks of sentiment classification, natural language inference, and toxic comment, and also perform experimental comparisons on models and APIs in realistic scenarios. For example, in the Google Cloud commercial API adversarial attack experiment, compared to the existing hard-label method, our method reduces the average number of queries required for the attack from 6986 to 176, and increases semantic similarity from 0.844 to 0.876. It is shown through extensive experimental data that our approach not only significantly reduces the number of queries, but also significantly outperforms existing methods in terms of the quality of adversarial examples.

SGFL-Attack: A Similarity-Guidance Strategy for Hard-Label Textual Adversarial Attack Based on Feedback Learning

SSPAttack: A Simple and Sweet Paradigm for Black-Box Hard-Label Textual Adversarial Attack.

HyGloadAttack: Hard-label black-box textual adversarial attacks via hybrid optimization

WordBlitz: an Efficient Hard-Label Textual Adversarial Attack Method Jointly Leveraging Adversarial Transferability and Word Importance

TextHacker: Learning based Hybrid Local Search Algorithm for Text Hard-label Adversarial Attack

HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack on Text

LST2A: Lexical-Syntactic Targeted Adversarial Attack for Texts

Adversarial Training with Fast Gradient Projection Method Against Synonym Substitution Based Text Attacks

PAT: Geometry-Aware Hard-Label Black-Box Adversarial Attacks on Text

TextCheater: A Query-Efficient Textual Adversarial Attack in the Hard-Label Setting

Fast Gradient Projection Method for Text Adversary Generation and Adversarial Training

LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial Attack

Efficient Text-Based Evolution Algorithm to Hard-Label Adversarial Attacks on Text

Adaptive Gradient-based Word Saliency for Adversarial Text Attacks

Searching for Textual Adversarial Examples with Learned Strategy.

TextGuise: Adaptive Adversarial Example Attacks on Text Classification Model.

FCGSM: Fast Conjugate Gradient Sign Method for Adversarial Attack on Image Classification

Learning to Generate Textual Adversarial Examples

BFS2Adv: Black-Box Adversarial Attack Towards Hard-to-Attack Short Texts

Textual Adversarial Attack As Combinatorial Optimization

Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial Attack Framework