Abstract:The emergence of Large Language Models (LLMs) has significantly influenced various aspects of software development activities. Despite their benefits, LLMs also pose notable risks, including the potential to generate harmful content and being abused by malicious developers to create malicious code. Several previous studies have focused on the ability of LLMs to resist the generation of harmful content that violates human ethical standards, such as biased or offensive content. However, there is no research evaluating the ability of LLMs to resist malicious code generation. To fill this gap, we propose RMCBench, the first benchmark comprising 473 prompts designed to assess the ability of LLMs to resist malicious code generation. This benchmark employs two scenarios: a text-to-code scenario, where LLMs are prompted with descriptions to generate code, and a code-to-code scenario, where LLMs translate or complete existing malicious code. Based on RMCBench, we conduct an empirical study on 11 representative LLMs to assess their ability to resist malicious code generation. Our findings indicate that current LLMs have a limited ability to resist malicious code generation with an average refusal rate of 40.36% in text-to-code scenario and 11.52% in code-to-code scenario. The average refusal rate of all LLMs in RMCBench is only 28.71%; ChatGPT-4 has a refusal rate of only 35.73%. We also analyze the factors that affect LLMs' ability to resist malicious code generation and provide implications for developers to enhance model robustness.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of evaluating the ability of large language models (LLMs) in generating malicious code. Specifically, although LLMs perform excellently in software development activities, they also bring significant risks, including the possibility of generating harmful content and being misused by malicious developers to create malicious code. Previous research has mainly focused on the resistance ability of LLMs to harmful content that violates human ethical standards (such as biased or offensive content), but no research has specifically evaluated the resistance ability of LLMs to malicious code generation. To fill this gap, the authors propose **RMCBench**, the first benchmarking platform specifically designed to evaluate the ability of LLMs to resist malicious code generation. RMCBench contains 473 prompts to test the performance of LLMs in two scenarios: 1. **Text - to - Code scenario**: LLMs generate code according to natural language descriptions. 2. **Code - to - Code scenario**: LLMs translate or complete existing malicious code. Through these two scenarios, RMCBench evaluates the performance of 11 representative LLMs in different tasks and analyzes the factors that affect the ability of LLMs to resist malicious code generation, thereby providing suggestions for developers to enhance the robustness of the models. ### Main contributions: - Proposed the first benchmarking platform **RMCBench** for evaluating the ability of LLMs to resist malicious code generation. - Conducted the first empirical study on 11 representative LLMs to evaluate their performance in different scenarios and tasks. - Analyzed the factors that affect the ability of LLMs to resist malicious code generation and provided improvement suggestions. - Made the relevant code and data publicly available for further research. ### Research findings: - In the Text - to - Code scenario, the average rejection rate of all 11 LLMs is 40.36%, and the rejection rates of Level 1, Level 2, and Level 3 are 60.80%, 28.43%, and 36.18% respectively. - In the Code - to - Code scenario, the average rejection rate of LLMs is only 11.52%, which is much lower than that in the Text - to - Code scenario. - Factors such as model parameters, model types, malicious code types, programming languages, and input context lengths all affect the resistance ability of LLMs. Through these studies, the authors hope to raise awareness of the security of LLMs and provide directions for future improvements.

RMCBench: Benchmarking Large Language Models' Resistance to Malicious Code

What You See Is Not Always What You Get: An Empirical Study of Code Comprehension by Large Language Models

ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming

Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models

On the Adversarial Robustness of Instruction-Tuned Large Language Models for Code

CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

Safety Assessment of Chinese Large Language Models

CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models

Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks

CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

Can We Trust Large Language Models Generated Code? A Framework for In-Context Learning, Security Patterns, and Code Evaluations Across Diverse LLMs

Comparing Robustness Against Adversarial Attacks in Code Generation: LLM-Generated vs. Human-Written

OR-Bench: An Over-Refusal Benchmark for Large Language Models

How secure is AI-generated Code: A Large-Scale Comparison of Large Language Models

CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion

Are You Human? An Adversarial Benchmark to Expose LLMs

Transfer Attacks and Defenses for Large Language Models on Coding Tasks

Red Teaming Language Model Detectors with Language Models

LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked

Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility