Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models

Manish Bhatt,Sahana Chennabasappa,Cyrus Nikolaidis,Shengye Wan,Ivan Evtimov,Dominik Gabi,Daniel Song,Faizan Ahmad,Cornelius Aschermann,Lorenzo Fontana,Sasha Frolov,Ravi Prakash Giri,Dhaval Kapil,Yiannis Kozyrakis,David LeBlanc,James Milazzo,Aleksandar Straumann,Gabriel Synnaeve,Varun Vontimitta,Spencer Whitman,Joshua Saxe
2023-12-08
Abstract:This paper presents CyberSecEval, a comprehensive benchmark developed to help bolster the cybersecurity of Large Language Models (LLMs) employed as coding assistants. As what we believe to be the most extensive unified cybersecurity safety benchmark to date, CyberSecEval provides a thorough evaluation of LLMs in two crucial security domains: their propensity to generate insecure code and their level of compliance when asked to assist in cyberattacks. Through a case study involving seven models from the Llama 2, Code Llama, and OpenAI GPT large language model families, CyberSecEval effectively pinpointed key cybersecurity risks. More importantly, it offered practical insights for refining these models. A significant observation from the study was the tendency of more advanced models to suggest insecure code, highlighting the critical need for integrating security considerations in the development of sophisticated LLMs. CyberSecEval, with its automated test case generation and evaluation pipeline covers a broad scope and equips LLM designers and researchers with a tool to broadly measure and enhance the cybersecurity safety properties of LLMs, contributing to the development of more secure AI systems.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on two aspects: 1. **Risk of generating insecure code**: - When large - language models (LLMs) generate code, they may violate security best practices or introduce exploitable vulnerabilities. This risk is not theoretical, because developers often accept a large amount of code generated by these models. For example, a study on GitHub shows that 46% of the code on its platform is automatically generated by CoPilot; a study by Meta also found that when developers accept the suggestions of the CodeCompose model, the acceptance rate is 22%. In addition, previous studies have shown that 40% of code suggestions have vulnerabilities, and user studies have pointed out that developers are 10% more likely to accept the wrong code generated by LLM than when they write it themselves. - To mitigate this risk, CYBER SECEVAL has designed an automatic test case generation and evaluation pipeline, which can detect whether there are insecure coding practices in the code generated by LLM and provide directions for improvement. By iteratively optimizing the model according to these evaluation results, LLM designers and researchers can improve the security of the generated code. 2. **Risk of assisting in cyber - attacks**: - Another important question is whether LLMs will assist in cyber - attacks under malicious requests. Although many base models already have the ability to resist illegal and criminal activities, this study explores whether this ability applies to models with coding capabilities. - The study found that the code itself does not directly determine its maliciousness or benignity, and the key lies in the intention. Therefore, CYBER SECEVAL evaluates whether it will provide help under public malicious requests by testing the response of LLM to malicious requests. This helps product designers foresee and mitigate the risks associated with malicious applications. By understanding how their AI systems respond to such requests, developers can implement appropriate security measures, such as rejection skills or user warnings, to prevent the model from being misused. In general, CYBER SECEVAL aims to provide a comprehensive benchmarking tool to help LLM designers and researchers measure and enhance the security of LLM in terms of network security, thereby promoting the development of more secure artificial intelligence systems.