Black-box Word-level Textual Adversarial Attack Based On Discrete Harris Hawks Optimization.

Tianrui Wang,Weina Niu,Kangyi Ding,Lingfeng Yao,Pengsen Cheng,Xiaosong Zhang
DOI: https://doi.org/10.1109/CSCWD57460.2023.10152713
2023-01-01
Abstract:Neural network-based applications are prone to being fooled by adversarial examples due to the natural vulnerability of deep neural networks (DNNs). Textual adversarial attacks are particularly challenging due to the discreteness between texts. The adversarial examples crafted by word-level textual attacks which are typically treated as optimization problems in black-box scenarios perform better in human evaluation. Existing approaches have struggled to balance the success rate with the time consuming, mainly because the chosen optimization algorithm is not efficient enough. In this paper, we propose a method to generate textual adversarial examples called Discrete Harris Hawk Optimization (DHHO). We set up three operations for handling discrete data, which are applied to each stage of the Harris Hawk Optimization (HHO) to enable it to solve optimization problems in discrete space. By attacking BiLSTM and BERT on two benchmark data sets, we conduct extensive experiments to evaluate our attack method with a success rate of up to 98% and a reduction of time is at least 50%. Moreover, the experimental results also show that our adversarial examples can ensure high quality and transferability.
What problem does this paper attempt to address?