Abstract:The robustness of large language models (LLMs) becomes increasingly important as their use rapidly grows in a wide range of domains. Retrieval-Augmented Generation (RAG) is considered as a means to improve the trustworthiness of text generation from LLMs. However, how the outputs from RAG-based LLMs are affected by slightly different inputs is not well studied. In this work, we find that the insertion of even a short prefix to the prompt leads to the generation of outputs far away from factually correct answers. We systematically evaluate the effect of such prefixes on RAG by introducing a novel optimization technique called Gradient Guided Prompt Perturbation (GGPP). GGPP achieves a high success rate in steering outputs of RAG-based LLMs to targeted wrong answers. It can also cope with instructions in the prompts requesting to ignore irrelevant context. We also exploit LLMs' neuron activation difference between prompts with and without GGPP perturbations to give a method that improves the robustness of RAG-based LLMs through a highly effective detector trained on neuron activation triggered by GGPP generated prompts. Our evaluation on open-sourced LLMs demonstrates the effectiveness of our methods.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve The paper aims to address the robustness issues of large language models (LLMs) when using retrieval-augmented generation (RAG) techniques. Specifically, the paper focuses on how minor input changes (e.g., inserting short prefixes in prompts) can affect the factual accuracy of RAG-generated outputs. The study finds that even very short prefix insertions can lead RAG to generate answers that deviate from factual correctness. ### Main Research Content 1. **Background**: - Large language models (LLMs) are increasingly used in various fields, but they suffer from hallucination issues, poor handling of long-tail factual knowledge, and inconsistent accuracy in extracting information from long contexts. - RAG is introduced to enhance the trustworthiness of LLMs by augmenting them with data retrieval capabilities, enabling them to generate text using reliable data sources and reducing factual errors. - However, the robustness of RAG against minor input changes has not been thoroughly studied. 2. **Research Methods**: - A new optimization technique called Gradient-Guided Prompt Perturbation (GGPP) is introduced to systematically evaluate the impact of prefixes on RAG outputs. - GGPP can guide RAG-generated answers to target incorrect answers with high success rates and can handle instructions in prompts that require ignoring irrelevant contexts. - By analyzing the differences in neuron activations of LLMs with and without GGPP perturbations in prompts, a highly effective detector based on neuron activations is proposed to improve RAG's robustness. 3. **Experimental Results**: - Evaluations on open-source LLMs show that the GGPP method is highly effective in altering retrieval results to point to target text passages. - The proposed ACT probe (based on neuron activations in the last layer of LLMs) provides a cost-effective defense measure, effectively detecting perturbed prompts. ### Main Contributions 1. **Revealing RAG's Robustness Issues**: The study shows that even minor prompt changes can lead RAG to generate factually incorrect answers. 2. **Proposing the GGPP Method**: Through Gradient-Guided Prompt Perturbation, the study successfully guides RAG-generated answers to target incorrect answers. 3. **Developing Detection Methods**: Neuron activation-based SATe and ACT probes are developed, effectively detecting prompt perturbations and factual errors, thereby improving RAG's robustness. ### Conclusion The study emphasizes the importance of evaluating RAG's robustness in critical applications and proposes an effective method to enhance RAG's robustness and trustworthiness.

Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models

Corrective Retrieval Augmented Generation

Benchmarking Large Language Models in Retrieval-Augmented Generation

Towards Understanding Retrieval Accuracy and Prompt Quality in RAG Systems

Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation

Retrieval-Augmented Generation for Large Language Models: A Survey

BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models

NLPerturbator: Studying the Robustness of Code LLMs to Natural Language Variations

A Survey on Retrieval-Augmented Text Generation for Large Language Models

PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG

Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework

GANPrompt: Enhancing Robustness in LLM-Based Recommendations with GAN-Enhanced Diversity Prompts

Set-Based Prompting: Provably Solving the Language Model Order Dependency Problem

Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models

MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training

XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution

Deliberate then Generate: Enhanced Prompting Framework for Text Generation

PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models