Abstract:In this paper, we systematically evaluate the robustness of multi-exit language models against adversarial slowdown. To audit their robustness, we design a slowdown attack that generates natural adversarial text bypassing early-exit points. We use the resulting WAFFLE attack as a vehicle to conduct a comprehensive evaluation of three multi-exit mechanisms with the GLUE benchmark against adversarial slowdown. We then show our attack significantly reduces the computational savings provided by the three methods in both white-box and black-box settings. The more complex a mechanism is, the more vulnerable it is to adversarial slowdown. We also perform a linguistic analysis of the perturbed text inputs, identifying common perturbation patterns that our attack generates, and comparing them with standard adversarial text attacks. Moreover, we show that adversarial training is ineffective in defeating our slowdown attack, but input sanitization with a conversational model, e.g., ChatGPT, can remove perturbations effectively. This result suggests that future work is needed for developing efficient yet robust multi-exit models. Our code is available at: <a class="link-external link-https" href="https://github.com/ztcoalson/WAFFLE" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper mainly studies the robustness of multi - exit language models in the face of adversarial slowdown attacks. Specifically, the author designed an attack method named WAFFLE to evaluate whether these models can still maintain computational efficiency when interfered by adversarial text input. #### Main problems 1. **Robustness against adversarial slowdown attacks**: Can the existing multi - exit mechanisms maintain their computational - saving advantages when facing adversarial text input? Are these mechanisms equally vulnerable in white - box and black - box environments? 2. **Effectiveness of existing defense measures**: Can existing defense methods (such as adversarial training) effectively resist adversarial slowdown attacks? 3. **Characteristics of adversarial texts**: How is the generation pattern of adversarial texts different from that of standard adversarial text attacks? What are the influence mechanisms of these adversarial texts on multi - exit models? #### Research background In recent years, large - scale pre - trained language models (such as BERT, T5) have made remarkable progress in natural language processing tasks. However, these models usually require a large amount of memory and computational resources, which limits their application in real - time systems. For this reason, researchers have proposed multi - exit mechanisms. By setting internal classifiers (early exits) at different layers of the model, the model can terminate the inference process early at shallower layers, thereby reducing computational overhead. #### Research contributions 1. **Proposing the WAFFLE attack**: The author designed a new adversarial slowdown attack method named WAFFLE. This method generates natural adversarial texts, making multi - exit models unable to terminate the inference early at the early exit points. 2. **Systematically evaluating three multi - exit mechanisms**: The author systematically evaluated the robustness of three multi - exit mechanisms (DeeBERT, PABEE, PastFuture) under adversarial slowdown attacks using the GLUE benchmark dataset. 3. **Effectiveness in black - box scenarios**: The author showed that the WAFFLE attack can also effectively reduce the computational - saving effect of multi - exit models in black - box scenarios. 4. **Linguistic analysis**: Through the linguistic analysis of adversarial texts, the author found two key features: subject - verb disagreement and the change of named entities, which make adversarial texts more likely to cause slowdown. 5. **Evaluation of defense measures**: The author tested several potential defense measures and found that adversarial training is ineffective, while input sanitization can effectively remove adversarial perturbations. #### Conclusion Multi - exit language models show vulnerability in the face of adversarial slowdown attacks, especially complex mechanisms are more vulnerable to attacks. Future research needs to develop multi - exit models that are both efficient and robust to meet this challenge.

BERT Lost Patience Won't Be Robust to Adversarial Slowdown

SlowBERT: Slow-down Attacks on Input-adaptive Multi-exit BERT

Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models

Dynamic Transformers Provide a False Sense of Efficiency

On Evaluating Adversarial Robustness of Large Vision-Language Models

Towards improving fast adversarial training in multi-exit network

Robustness Over Time: Understanding Adversarial Examples' Effectiveness on Longitudinal Versions of Large Language Models

Adversarial Training for Improving Model Robustness? Look at Both Prediction and Interpretation

SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for Accelerating BERT Inference.

BERT-ATTACK: Adversarial Attack Against BERT Using BERT

No-Skim: Towards Efficiency Robustness Evaluation on Skimming-based Language Models

Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

White-Box Multi-Objective Adversarial Attack on Dialogue Generation

Flooding-X: Improving BERT's Resistance to Adversarial Attacks Via Loss-Restricted Fine-Tuning.

Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

Robust LLM safeguarding via refusal feature adversarial training

Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods

FRACTURED-SORRY-Bench: Framework for Revealing Attacks in Conversational Turns Undermining Refusal Efficacy and Defenses over SORRY-Bench (Automated Multi-shot Jailbreaks)

An LLM can Fool Itself: A Prompt-Based Adversarial Attack

TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations

Adversarial Attacks and Defenses in Large Language Models: Old and New Threats