Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models

Zhibo Hu,Chen Wang,Yanfeng Shu,Helen,Paik,Liming Zhu
2024-07-24
Abstract:The robustness of large language models (LLMs) becomes increasingly important as their use rapidly grows in a wide range of domains. Retrieval-Augmented Generation (RAG) is considered as a means to improve the trustworthiness of text generation from LLMs. However, how the outputs from RAG-based LLMs are affected by slightly different inputs is not well studied. In this work, we find that the insertion of even a short prefix to the prompt leads to the generation of outputs far away from factually correct answers. We systematically evaluate the effect of such prefixes on RAG by introducing a novel optimization technique called Gradient Guided Prompt Perturbation (GGPP). GGPP achieves a high success rate in steering outputs of RAG-based LLMs to targeted wrong answers. It can also cope with instructions in the prompts requesting to ignore irrelevant context. We also exploit LLMs' neuron activation difference between prompts with and without GGPP perturbations to give a method that improves the robustness of RAG-based LLMs through a highly effective detector trained on neuron activation triggered by GGPP generated prompts. Our evaluation on open-sourced LLMs demonstrates the effectiveness of our methods.
Computation and Language,Information Retrieval
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve The paper aims to address the robustness issues of large language models (LLMs) when using retrieval-augmented generation (RAG) techniques. Specifically, the paper focuses on how minor input changes (e.g., inserting short prefixes in prompts) can affect the factual accuracy of RAG-generated outputs. The study finds that even very short prefix insertions can lead RAG to generate answers that deviate from factual correctness. ### Main Research Content 1. **Background**: - Large language models (LLMs) are increasingly used in various fields, but they suffer from hallucination issues, poor handling of long-tail factual knowledge, and inconsistent accuracy in extracting information from long contexts. - RAG is introduced to enhance the trustworthiness of LLMs by augmenting them with data retrieval capabilities, enabling them to generate text using reliable data sources and reducing factual errors. - However, the robustness of RAG against minor input changes has not been thoroughly studied. 2. **Research Methods**: - A new optimization technique called Gradient-Guided Prompt Perturbation (GGPP) is introduced to systematically evaluate the impact of prefixes on RAG outputs. - GGPP can guide RAG-generated answers to target incorrect answers with high success rates and can handle instructions in prompts that require ignoring irrelevant contexts. - By analyzing the differences in neuron activations of LLMs with and without GGPP perturbations in prompts, a highly effective detector based on neuron activations is proposed to improve RAG's robustness. 3. **Experimental Results**: - Evaluations on open-source LLMs show that the GGPP method is highly effective in altering retrieval results to point to target text passages. - The proposed ACT probe (based on neuron activations in the last layer of LLMs) provides a cost-effective defense measure, effectively detecting perturbed prompts. ### Main Contributions 1. **Revealing RAG's Robustness Issues**: The study shows that even minor prompt changes can lead RAG to generate factually incorrect answers. 2. **Proposing the GGPP Method**: Through Gradient-Guided Prompt Perturbation, the study successfully guides RAG-generated answers to target incorrect answers. 3. **Developing Detection Methods**: Neuron activation-based SATe and ACT probes are developed, effectively detecting prompt perturbations and factual errors, thereby improving RAG's robustness. ### Conclusion The study emphasizes the importance of evaluating RAG's robustness in critical applications and proposes an effective method to enhance RAG's robustness and trustworthiness.