Adversarial Search Engine Optimization for Large Language Models

Fredrik Nestaas,Edoardo Debenedetti,Florian Tramèr

2024-07-02

Abstract:Large Language Models (LLMs) are increasingly used in applications where the model selects from competing third-party content, such as in LLM-powered search engines or chatbot plugins. In this paper, we introduce Preference Manipulation Attacks, a new class of attacks that manipulate an LLM's selections to favor the attacker. We demonstrate that carefully crafted website content or plugin documentations can trick an LLM to promote the attacker products and discredit competitors, thereby increasing user traffic and monetization. We show this leads to a prisoner's dilemma, where all parties are incentivized to launch attacks, but the collective effect degrades the LLM's outputs for everyone. We demonstrate our attacks on production LLM search engines (Bing and Perplexity) and plugin APIs (for GPT-4 and Claude). As LLMs are increasingly used to rank third-party content, we expect Preference Manipulation Attacks to emerge as a significant threat.

Cryptography and Security,Machine Learning

What problem does this paper attempt to address?

This paper discusses a new type of attack against large language models (LLMs) called "preference manipulation attack". In current applications, LLMs are commonly used to make selections from third-party content, such as search engine or chatbot plugins. The research found that attackers can manipulate LLMs by carefully designing webpage content or plugin descriptions to bias the models towards recommending their own content and undermining competitors, thereby increasing user traffic and commercial benefits. This can lead to a prisoner's dilemma situation, where all participants have the motive to launch attacks but collectively worsen the quality of LLM outputs. The paper demonstrates the effectiveness of such attacks on LLM search engines in production environments, such as Bing and Perplexity, as well as plugin APIs like GPT-4 and Claude. These attacks are black-box and covert, capable of reliably manipulating LLMs to promote the attacker's content. For example, when inquiring about recommended cameras from Bing, manipulated cameras are more likely to be recommended than other cameras. Furthermore, preference manipulation attacks have more complex adversarial dynamics compared to traditional search engine optimization (SEO). The attacks not only elevate the attacker's rankings but also disparage competitors. When multiple attackers target the same LLM, the presence of all participants in search results diminishes. The paper points out that as LLMs play an increasingly influential role in third-party content search and ranking, preference manipulation attacks may become a severe threat in the real world. It recommends the development of new defense mechanisms to protect search applications from such attacks. The research team has reported their findings to the potentially affected parties and will publicly disclose the results after a 90-day standard vulnerability disclosure process.

Adversarial Search Engine Optimization for Large Language Models

Universal and Transferable Adversarial Attacks on Aligned Language Models

Adversarial Attacks and Defenses in Large Language Models: Old and New Threats

Exploring the Adversarial Capabilities of Large Language Models

Ranking Manipulation for Conversational Search Engines

The Philosopher's Stone: Trojaning Plugins of Large Language Models

Generating Valid and Natural Adversarial Examples with Large Language Models

Misusing Tools in Large Language Models With Visual Adversarial Examples

Adversarial Evasion Attack Efficiency against Large Language Models

Manipulating Large Language Models to Increase Product Visibility

Adversarial Attacks on Large Language Models in Medicine

Assessing Adversarial Robustness of Large Language Models: An Empirical Study

Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

PAL: Proxy-Guided Black-Box Attack on Large Language Models

Bergeron: Combating Adversarial Attacks through a Conscience-Based Alignment Framework

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

Adversarial Attacks on Large Language Models Using Regularized Relaxation

Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

Large Language Model Sentinel: Advancing Adversarial Robustness by LLM Agent

Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content