Adversarial Search Engine Optimization for Large Language Models

Fredrik Nestaas,Edoardo Debenedetti,Florian Tramèr
2024-07-02
Abstract:Large Language Models (LLMs) are increasingly used in applications where the model selects from competing third-party content, such as in LLM-powered search engines or chatbot plugins. In this paper, we introduce Preference Manipulation Attacks, a new class of attacks that manipulate an LLM's selections to favor the attacker. We demonstrate that carefully crafted website content or plugin documentations can trick an LLM to promote the attacker products and discredit competitors, thereby increasing user traffic and monetization. We show this leads to a prisoner's dilemma, where all parties are incentivized to launch attacks, but the collective effect degrades the LLM's outputs for everyone. We demonstrate our attacks on production LLM search engines (Bing and Perplexity) and plugin APIs (for GPT-4 and Claude). As LLMs are increasingly used to rank third-party content, we expect Preference Manipulation Attacks to emerge as a significant threat.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
This paper discusses a new type of attack against large language models (LLMs) called "preference manipulation attack". In current applications, LLMs are commonly used to make selections from third-party content, such as search engine or chatbot plugins. The research found that attackers can manipulate LLMs by carefully designing webpage content or plugin descriptions to bias the models towards recommending their own content and undermining competitors, thereby increasing user traffic and commercial benefits. This can lead to a prisoner's dilemma situation, where all participants have the motive to launch attacks but collectively worsen the quality of LLM outputs. The paper demonstrates the effectiveness of such attacks on LLM search engines in production environments, such as Bing and Perplexity, as well as plugin APIs like GPT-4 and Claude. These attacks are black-box and covert, capable of reliably manipulating LLMs to promote the attacker's content. For example, when inquiring about recommended cameras from Bing, manipulated cameras are more likely to be recommended than other cameras. Furthermore, preference manipulation attacks have more complex adversarial dynamics compared to traditional search engine optimization (SEO). The attacks not only elevate the attacker's rankings but also disparage competitors. When multiple attackers target the same LLM, the presence of all participants in search results diminishes. The paper points out that as LLMs play an increasingly influential role in third-party content search and ranking, preference manipulation attacks may become a severe threat in the real world. It recommends the development of new defense mechanisms to protect search applications from such attacks. The research team has reported their findings to the potentially affected parties and will publicly disclose the results after a 90-day standard vulnerability disclosure process.