Abstract:Dense retrievers have achieved state-of-the-art performance in various information retrieval tasks, but to what extent can they be safely deployed in real-world applications? In this work, we propose a novel attack for dense retrieval systems in which a malicious user generates a small number of adversarial passages by perturbing discrete tokens to maximize similarity with a provided set of training queries. When these adversarial passages are inserted into a large retrieval corpus, we show that this attack is highly effective in fooling these systems to retrieve them for queries that were not seen by the attacker. More surprisingly, these adversarial passages can directly generalize to out-of-domain queries and corpora with a high success attack rate -- for instance, we find that 50 generated passages optimized on Natural Questions can mislead >94% of questions posed in financial documents or online forums. We also benchmark and compare a range of state-of-the-art dense retrievers, both unsupervised and supervised. Although different systems exhibit varying levels of vulnerability, we show they can all be successfully attacked by injecting up to 500 passages, a small fraction compared to a retrieval corpus of millions of passages.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **the security issues of dense retrieval systems in practical applications**. Specifically, the author proposes a new attack method. By injecting a small number of adversarial passages into the retrieval corpus, the dense retrieval system is misled to return these adversarial passages as query results. This attack method is not only effective for queries in the training set, but can also be generalized to unseen queries, and even cross - domain queries. ### Main research content: 1. **Problem definition**: - The author focuses on the security of dense retrieval systems (such as DPR, ANCE, etc.) in practical applications. - They propose a new type of corpus poisoning attack, in which malicious users generate a small number of adversarial passages and insert them into the retrieval corpus to mislead the dense retrieval system. 2. **Attack method**: - **Optimization process**: Use the gradient descent method (inspired by HotFlip) to generate adversarial passages, so that these passages maximize the similarity with a set of training queries. - **Multi - paragraph generation**: Group similar queries by clustering algorithms (such as k - means), and generate an adversarial passage for each query group. 3. **Experimental setup**: - Use the BEIR benchmark test set for experiments, including two main datasets, Natural Questions and MS MARCO. - Evaluate the impact of different numbers of adversarial passages on various dense retrieval models (such as Contriever, DPR, ANCE, etc.). 4. **Experimental results**: - Even if only a small number of adversarial passages are generated (such as 10 or 50), the performance of the dense retrieval system can be significantly reduced, and the system can be misled to return wrong results. - The adversarial passages are not only effective for queries in the training set, but can also be generalized to unseen queries, and even cross - domain queries (such as financial documents, online forums, etc.). 5. **Case study**: - The author also explores the possibility of generating adversarial passages containing specific misinformation, showing the potential harm of this attack in practical applications. ### Conclusion: - This research reveals a new vulnerability in dense retrieval systems in practical applications, that is, a small number of adversarial passages can significantly affect the retrieval results of the system. - These findings are of great significance for the future design and deployment of dense retrieval systems, and it is necessary to further improve the robustness and security of these systems. ### Significance: - **Security improvement**: This research reminds researchers and developers of the possible security threats that dense retrieval systems may face in practical applications, and measures need to be taken to improve the robustness of the system. - **Defense strategy**: Although adversarial passages can be generated, the research also provides potential defense strategies, such as detecting unnatural passages. ### Formula display: - **Similarity calculation**: \[ a=\arg \max_{a'}\frac{1}{|Q|}\sum_{q_i\in Q}E_q(q_i)^T E_p(a') \] where \(E_q(q_i)\) and \(E_p(a')\) are the embedding vectors of query \(q_i\) and adversarial passage \(a'\) respectively. - **Best replacement candidate**: \[ \arg \max_{t'_i\in V}\frac{1}{|Q|}\sum_{q\in Q}e_{t'_i}^T\nabla_{e_{t_i}}\text{sim}(q, a) \] where \(V\) is the vocabulary, \(e_{t'_i}\) is the embedding vector of word \(t'_i\) in the vocabulary, and \(\nabla_{e_{t_i}}\text{sim}(q, a)\) is the gradient with respect to the embedding vector of word \(t_i\). Hope this information is helpful for you to understand the content of this paper! If you have more questions, feel free to continue asking.

Poisoning Retrieval Corpora by Injecting Adversarial Passages

Corpus Poisoning via Approximate Greedy Gradient Descent

Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models

Controlled Generation of Natural Adversarial Documents for Stealthy Retrieval Poisoning

On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains

Deep Learning Based Dense Retrieval: A Comparative Study

PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models

BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models

Concealed Data Poisoning Attacks on NLP Models

Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications

Does Vec2Text Pose a New Corpus Poisoning Threat?

AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder

Poisoning Web-Scale Training Datasets is Practical

Efficient Trigger Word Insertion

HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models

Defending against Insertion-based Textual Backdoor Attacks via Attribution

Poison Attack and Defense on Deep Source Code Processing Models

Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

Circumventing Backdoor Defenses That Are Based on Latent Separability