A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly

Yifan Yao,Jinhao Duan,Kaidi Xu,Yuanfang Cai,Zhibo Sun,Yue Zhang
DOI: https://doi.org/10.1016/j.hcc.2024.100211
2024-03-21
Abstract:Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized natural language understanding and generation. They possess deep language comprehension, human-like text generation capabilities, contextual awareness, and robust problem-solving skills, making them invaluable in various domains (e.g., search engines, customer support, translation). In the meantime, LLMs have also gained traction in the security community, revealing security vulnerabilities and showcasing their potential in security-related tasks. This paper explores the intersection of LLMs with security and privacy. Specifically, we investigate how LLMs positively impact security and privacy, potential risks and threats associated with their use, and inherent vulnerabilities within LLMs. Through a comprehensive literature review, the paper categorizes the papers into "The Good" (beneficial LLM applications), "The Bad" (offensive applications), and "The Ugly" (vulnerabilities of LLMs and their defenses). We have some interesting findings. For example, LLMs have proven to enhance code security (code vulnerability detection) and data privacy (data confidentiality protection), outperforming traditional methods. However, they can also be harnessed for various attacks (particularly user-level attacks) due to their human-like reasoning abilities. We have identified areas that require further research efforts. For example, Research on model and parameter extraction attacks is limited and often theoretical, hindered by LLM parameter scale and confidentiality. Safe instruction tuning, a recent development, requires more exploration. We hope that our work can shed light on the LLMs' potential to both bolster and jeopardize cybersecurity.
Cryptography and Security,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to explore the multi - faceted impacts of large - language models (LLMs) in the security and privacy fields, specifically including: 1. **The positive contributions of LLMs to security and privacy** (The Good): - How LLMs have a positive impact on security and privacy in various fields (such as code security, data privacy, etc.). - What advantages LLMs have compared to traditional methods. 2. **The potential risks and threats brought by LLMs** (The Bad): - What cybersecurity risks and threats may be triggered by using LLMs. - How LLMs can be used for malicious purposes and what types of cyber - attacks can be carried out using LLMs. 3. **The vulnerabilities and weaknesses of LLMs** (The Ugly): - What vulnerabilities and weaknesses exist in LLMs themselves. - How these vulnerabilities threaten security and privacy and how to defend against these threats. ### Specific problem summary - **RQ1: How do LLMs have a positive impact on security and privacy in different fields?** - The role of LLMs in code security, such as detecting vulnerabilities, generating test cases, and fixing code. - The application of LLMs in data security and privacy protection, such as ensuring data integrity, confidentiality, and traceability. - **RQ2: What potential risks and threats will the use of LLMs bring in terms of network security?** - LLMs may be used for hardware - level attacks, operating - system - level attacks, software - level attacks, phishing, and user - level attacks, etc. - User - level attacks are particularly common due to the human - reasoning ability of LLMs, threatening security and privacy. - **RQ3: What vulnerabilities and weaknesses exist inside LLMs?** - Classified as inherent vulnerabilities in AI models (such as data poisoning, backdoor attacks, training - data extraction) and non - AI - model - inherent vulnerabilities (such as remote - code execution, prompt injection, side - channel attacks). - Defense mechanisms against these vulnerabilities, including strategies at the architecture level, training stage, and inference stage. Through these questions, the paper aims to comprehensively evaluate the potential and challenges of LLMs in the security and privacy fields and provide directions for future research.