RMCBench: Benchmarking Large Language Models' Resistance to Malicious Code

Jiachi Chen,Qingyuan Zhong,Yanlin Wang,Kaiwen Ning,Yongkun Liu,Zenan Xu,Zhe Zhao,Ting Chen,Zibin Zheng
DOI: https://doi.org/10.1145/3691620.3695480
2024-09-24
Abstract:The emergence of Large Language Models (LLMs) has significantly influenced various aspects of software development activities. Despite their benefits, LLMs also pose notable risks, including the potential to generate harmful content and being abused by malicious developers to create malicious code. Several previous studies have focused on the ability of LLMs to resist the generation of harmful content that violates human ethical standards, such as biased or offensive content. However, there is no research evaluating the ability of LLMs to resist malicious code generation. To fill this gap, we propose RMCBench, the first benchmark comprising 473 prompts designed to assess the ability of LLMs to resist malicious code generation. This benchmark employs two scenarios: a text-to-code scenario, where LLMs are prompted with descriptions to generate code, and a code-to-code scenario, where LLMs translate or complete existing malicious code. Based on RMCBench, we conduct an empirical study on 11 representative LLMs to assess their ability to resist malicious code generation. Our findings indicate that current LLMs have a limited ability to resist malicious code generation with an average refusal rate of 40.36% in text-to-code scenario and 11.52% in code-to-code scenario. The average refusal rate of all LLMs in RMCBench is only 28.71%; ChatGPT-4 has a refusal rate of only 35.73%. We also analyze the factors that affect LLMs' ability to resist malicious code generation and provide implications for developers to enhance model robustness.
Software Engineering,Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of evaluating the ability of large language models (LLMs) in generating malicious code. Specifically, although LLMs perform excellently in software development activities, they also bring significant risks, including the possibility of generating harmful content and being misused by malicious developers to create malicious code. Previous research has mainly focused on the resistance ability of LLMs to harmful content that violates human ethical standards (such as biased or offensive content), but no research has specifically evaluated the resistance ability of LLMs to malicious code generation. To fill this gap, the authors propose **RMCBench**, the first benchmarking platform specifically designed to evaluate the ability of LLMs to resist malicious code generation. RMCBench contains 473 prompts to test the performance of LLMs in two scenarios: 1. **Text - to - Code scenario**: LLMs generate code according to natural language descriptions. 2. **Code - to - Code scenario**: LLMs translate or complete existing malicious code. Through these two scenarios, RMCBench evaluates the performance of 11 representative LLMs in different tasks and analyzes the factors that affect the ability of LLMs to resist malicious code generation, thereby providing suggestions for developers to enhance the robustness of the models. ### Main contributions: - Proposed the first benchmarking platform **RMCBench** for evaluating the ability of LLMs to resist malicious code generation. - Conducted the first empirical study on 11 representative LLMs to evaluate their performance in different scenarios and tasks. - Analyzed the factors that affect the ability of LLMs to resist malicious code generation and provided improvement suggestions. - Made the relevant code and data publicly available for further research. ### Research findings: - In the Text - to - Code scenario, the average rejection rate of all 11 LLMs is 40.36%, and the rejection rates of Level 1, Level 2, and Level 3 are 60.80%, 28.43%, and 36.18% respectively. - In the Code - to - Code scenario, the average rejection rate of LLMs is only 11.52%, which is much lower than that in the Text - to - Code scenario. - Factors such as model parameters, model types, malicious code types, programming languages, and input context lengths all affect the resistance ability of LLMs. Through these studies, the authors hope to raise awareness of the security of LLMs and provide directions for future improvements.