Token-modification adversarial attacks for natural language processing: A survey

Tom Roth,Yansong Gao,Alsharif Abuadbba,Surya Nepal,Wei Liu
DOI: https://doi.org/10.3233/aic-230279
IF: 1.029
2024-04-03
AI Communications
Abstract:Many adversarial attacks target natural language processing systems, most of which succeed through modifying the individual tokens of a document. Despite the apparent uniqueness of each of these attacks, fundamentally they are simply a distinct configuration of four components: a goal function, allowable transformations, a search method, and constraints. In this survey, we systematically present the different components used throughout the literature, using an attack-independent framework which allows for easy comparison and categorisation of components. Our work aims to serve as a comprehensive guide for newcomers to the field and to spark targeted research into refining the individual attack components.
computer science, artificial intelligence
What problem does this paper attempt to address?