Abstract:The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptRobust, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. The adversarial prompts, crafted to mimic plausible user errors like typos or synonyms, aim to evaluate how slight deviations can affect LLM outcomes while maintaining semantic integrity. These prompts are then employed in diverse tasks including sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our study generates 4,788 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets. Our findings demonstrate that contemporary LLMs are not robust to adversarial prompts. Furthermore, we present a comprehensive analysis to understand the mystery behind prompt robustness and its transferability. We then offer insightful robustness analysis and pragmatic recommendations for prompt composition, beneficial to both researchers and everyday users.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate the robustness of large - language models (LLMs) against adversarial prompts. Specifically, the author introduced a benchmarking tool named PromptRobust, aiming to systematically measure and analyze the performance of LLMs when facing various adversarial prompts. ### Problem Background With the wide application of large - language models (LLMs) in multiple fields, especially in safety - critical and decision - support fields, it is crucial to ensure the robustness of these models against input perturbations. However, existing research mainly focuses on adversarial samples and ignores the impact of adversarial prompts. Adversarial prompts refer to inputs that may cause LLMs to generate incorrect responses by slightly modifying the original prompt (such as misspellings, synonym replacements, etc.). ### Core Problems of the Paper 1. **Robustness Evaluation**: Are current LLMs robust enough when facing adversarial prompts? 2. **Influencing Factors**: What factors lead to the vulnerability of LLMs to adversarial prompts? 3. **Improvement Strategies**: How to improve the robustness of LLMs against adversarial prompts? ### Main Contributions 1. **PromptRobust Benchmark**: Proposed a comprehensive benchmarking tool for evaluating the robustness of LLMs against different types of adversarial prompts. 2. **Comprehensive Evaluation and Analysis**: Through extensive experiments on 8 tasks and 13 datasets, revealed the performance of LLMs under different attacks and provided visual explanations and transferability analysis. 3. **Practical Guidance**: Provided practical suggestions for researchers and users to help them design more robust prompts. ### Experimental Methods - **Prompt Types**: Include four prompt types: zero - shot, few - shot, role - oriented, and task - oriented. - **Attack Types**: Cover four attack methods: character - level, lexical - level, sentence - level, and semantic - level. - **Evaluation Metric**: Introduced the Performance Drop Rate (PDR) as a unified evaluation metric to quantify the performance change of the model under adversarial prompts. ### Experimental Results - **Overall Lack of Robustness**: The results show that current LLMs generally lack robustness when facing adversarial prompts. In particular, lexical - level attacks are the most effective, with an average performance drop of 33%. - **Model Differences**: There are significant differences in the robustness of different LLMs to adversarial prompts. GPT - 4 and UL2 perform relatively well, while Vicuna shows high vulnerability. - **Transferability Analysis**: The transferability of adversarial prompts between different models is limited, indicating that adversarial prompts designed for a specific model are difficult to be directly applied to other models. Through the above research, the author emphasized the importance of evaluating and improving the robustness of LLMs against adversarial prompts and provided directions and suggestions for future research.

PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts

Evaluating the Robustness of Discrete Prompts

One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models

Robust Prompt Optimization for Large Language Models Against Distribution Shifts

Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models

PromptBench: A Unified Library for Evaluation of Large Language Models

Noisy Exemplars Make Large Language Models More Robust: A Domain-Agnostic Behavioral Analysis

Goal-Oriented Prompt Attack and Safety Evaluation for LLMs

On the Worst Prompt Performance of Large Language Models

RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions

Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information

PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models

An LLM can Fool Itself: A Prompt-Based Adversarial Attack

Robust Testing of AI Language Model Resiliency with Novel Adversarial Prompts

A Prompting-based Approach for Adversarial Example Generation and Robustness Enhancement

NLPerturbator: Studying the Robustness of Code LLMs to Natural Language Variations

RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models

Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection

ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs