PAT: Geometry-Aware Hard-Label Black-Box Adversarial Attacks on Text

Muchao Ye,Jinghui Chen,Chenglin Miao,Han Liu,Ting Wang,Fenglong Ma
DOI: https://doi.org/10.1145/3580305.3599461
2023-01-01
Abstract:Despite a plethora of prior explorations, conducting text adversarial attacks in practical settings is still challenging with the following constraints: black box -- the inner structure of the victim model is unknown; hard label -- the attacker only has access to the top-1 prediction results; and semantic preservation - the perturbation needs to preserve the original semantics. In this paper, we present PAT, a novel adversarial attack method employed under all these constraints. Specifically, PAT explicitly models the adversarial and non-adversarial prototypes and incorporates them to measure semantic changes for replacement selection in the hard-label black-box setting to generate high-quality samples. In each iteration, PAT finds original words that can be replaced back and selects better candidate words for perturbed positions in a geometry-aware manner guided by this estimation, which maximally improves the perturbation construction and minimally impacts the original semantics. Extensive evaluation with benchmark datasets and state-of-the-art models shows that PAT outperforms existing text adversarial attacks in terms of both attack effectiveness and semantic preservation. Moreover, we validate the efficacy of PAT against industry-leading natural language processing platforms in real-world settings.
What problem does this paper attempt to address?