BERT Lost Patience Won't Be Robust to Adversarial Slowdown

Zachary Coalson,Gabriel Ritter,Rakesh Bobba,Sanghyun Hong
2023-10-31
Abstract:In this paper, we systematically evaluate the robustness of multi-exit language models against adversarial slowdown. To audit their robustness, we design a slowdown attack that generates natural adversarial text bypassing early-exit points. We use the resulting WAFFLE attack as a vehicle to conduct a comprehensive evaluation of three multi-exit mechanisms with the GLUE benchmark against adversarial slowdown. We then show our attack significantly reduces the computational savings provided by the three methods in both white-box and black-box settings. The more complex a mechanism is, the more vulnerable it is to adversarial slowdown. We also perform a linguistic analysis of the perturbed text inputs, identifying common perturbation patterns that our attack generates, and comparing them with standard adversarial text attacks. Moreover, we show that adversarial training is ineffective in defeating our slowdown attack, but input sanitization with a conversational model, e.g., ChatGPT, can remove perturbations effectively. This result suggests that future work is needed for developing efficient yet robust multi-exit models. Our code is available at: <a class="link-external link-https" href="https://github.com/ztcoalson/WAFFLE" rel="external noopener nofollow">this https URL</a>
Machine Learning,Computation and Language,Cryptography and Security
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper mainly studies the robustness of multi - exit language models in the face of adversarial slowdown attacks. Specifically, the author designed an attack method named WAFFLE to evaluate whether these models can still maintain computational efficiency when interfered by adversarial text input. #### Main problems 1. **Robustness against adversarial slowdown attacks**: Can the existing multi - exit mechanisms maintain their computational - saving advantages when facing adversarial text input? Are these mechanisms equally vulnerable in white - box and black - box environments? 2. **Effectiveness of existing defense measures**: Can existing defense methods (such as adversarial training) effectively resist adversarial slowdown attacks? 3. **Characteristics of adversarial texts**: How is the generation pattern of adversarial texts different from that of standard adversarial text attacks? What are the influence mechanisms of these adversarial texts on multi - exit models? #### Research background In recent years, large - scale pre - trained language models (such as BERT, T5) have made remarkable progress in natural language processing tasks. However, these models usually require a large amount of memory and computational resources, which limits their application in real - time systems. For this reason, researchers have proposed multi - exit mechanisms. By setting internal classifiers (early exits) at different layers of the model, the model can terminate the inference process early at shallower layers, thereby reducing computational overhead. #### Research contributions 1. **Proposing the WAFFLE attack**: The author designed a new adversarial slowdown attack method named WAFFLE. This method generates natural adversarial texts, making multi - exit models unable to terminate the inference early at the early exit points. 2. **Systematically evaluating three multi - exit mechanisms**: The author systematically evaluated the robustness of three multi - exit mechanisms (DeeBERT, PABEE, PastFuture) under adversarial slowdown attacks using the GLUE benchmark dataset. 3. **Effectiveness in black - box scenarios**: The author showed that the WAFFLE attack can also effectively reduce the computational - saving effect of multi - exit models in black - box scenarios. 4. **Linguistic analysis**: Through the linguistic analysis of adversarial texts, the author found two key features: subject - verb disagreement and the change of named entities, which make adversarial texts more likely to cause slowdown. 5. **Evaluation of defense measures**: The author tested several potential defense measures and found that adversarial training is ineffective, while input sanitization can effectively remove adversarial perturbations. #### Conclusion Multi - exit language models show vulnerability in the face of adversarial slowdown attacks, especially complex mechanisms are more vulnerable to attacks. Future research needs to develop multi - exit models that are both efficient and robust to meet this challenge.