Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey

Shang Wang,Tianqing Zhu,Bo Liu,Ming Ding,Xu Guo,Dayong Ye,Wanlei Zhou,Philip S. Yu

2024-06-18

Abstract:With the rapid development of artificial intelligence, large language models (LLMs) have made remarkable advancements in natural language processing. These models are trained on vast datasets to exhibit powerful language understanding and generation capabilities across various applications, including machine translation, chatbots, and agents. However, LLMs have revealed a variety of privacy and security issues throughout their life cycle, drawing significant academic and industrial attention. Moreover, the risks faced by LLMs differ significantly from those encountered by traditional language models. Given that current surveys lack a clear taxonomy of unique threat models across diverse scenarios, we emphasize the unique privacy and security threats associated with five specific scenarios: pre-training, fine-tuning, retrieval-augmented generation systems, deployment, and LLM-based agents. Addressing the characteristics of each risk, this survey outlines potential threats and countermeasures. Research on attack and defense situations can offer feasible research directions, enabling more areas to benefit from LLMs.

Cryptography and Security

What problem does this paper attempt to address?

This paper focuses on the privacy and security threats of large-scale language models (LLMs) in natural language processing. With the rapid development of artificial intelligence technology, LLMs have demonstrated strong capabilities in understanding and generating language in applications such as machine translation and chatbots. However, these models also expose a series of unique privacy and security issues throughout their lifecycle, which are different from the risks faced by traditional language models. The main motivation of this paper is that due to the unique abilities and structures of LLMs, they face unique security and privacy risks throughout their lifecycle, such as information leakage and malicious attacks in training data. Existing research describes various risks and countermeasures, but lacks a systematic exploration of these unique threats. Therefore, the paper divides the lifecycle of LLMs into five stages: pretraining, fine-tuning, retrieval-augmented generation system, deployment, and LLM-based agents, and discusses in detail the specific privacy and security risks and countermeasures for each stage. The contribution of this paper is to provide a comprehensive investigation that clarifies the threat models faced by LLMs at different stages, proposes unique risk classifications, and analyzes corresponding attack targets, motivations, and implementation methods. In addition, the paper also covers other privacy/security scenarios such as federated learning, machine forgetting, and watermarks, providing feasible research directions for researchers to promote the development of more secure and reliable LLM applications.

Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey

Security and Privacy Challenges of Large Language Models: A Survey

On Protecting the Data Privacy of Large Language Models (LLMs): A Survey

Privacy in Large Language Models: Attacks, Defenses and Future Directions

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions

Privacy-Preserving Large Language Models: Mechanisms, Applications, and Future Directions

Exploring Vulnerabilities and Protections in Large Language Models: A Survey

Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices

Recent Advances in Attack and Defense Approaches of Large Language Models

Privacy Issues in Large Language Models: A Survey

A Comprehensive Survey of Attack Techniques, Implementation, and Mitigation Strategies in Large Language Models

Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey

Exploring the Privacy Protection Capabilities of Chinese Large Language Models

Large Language Model Safety: A Holistic Survey

Threats to Pre-trained Language Models: Survey and Taxonomy

The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies

Large Language Model Supply Chain: Open Problems From the Security Perspective

A Survey of Large Language Models for Cyber Threat Detection