LLMSecCode: Evaluating Large Language Models for Secure Coding

Anton Rydén,Erik Näslund,Elad Michael Schiller,Magnus Almgren
2024-08-29
Abstract:The rapid deployment of Large Language Models (LLMs) requires careful consideration of their effect on cybersecurity. Our work aims to improve the selection process of LLMs that are suitable for facilitating Secure Coding (SC). This raises challenging research questions, such as (RQ1) Which functionality can streamline the LLM evaluation? (RQ2) What should the evaluation measure? (RQ3) How to attest that the evaluation process is impartial? To address these questions, we introduce LLMSecCode, an open-source evaluation framework designed to assess LLM SC capabilities objectively. We validate the LLMSecCode implementation through experiments. When varying parameters and prompts, we find a 10% and 9% difference in performance, respectively. We also compare some results to reliable external actors, where our results show a 5% difference. We strive to ensure the ease of use of our open-source framework and encourage further development by external actors. With LLMSecCode, we hope to encourage the standardization and benchmarking of LLMs' capabilities in security-oriented code and tasks.
Cryptography and Security,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to select large language models (LLMs) suitable for secure coding (SC), and ensure that the performance of these models in automated program repair (APR) and code generation (CG) tasks can meet the requirements of secure coding. Specifically, the paper focuses on the following three research questions: 1. **RQ1**: Which features can simplify the evaluation of the secure coding capabilities of LLMs? 2. **RQ2**: What metrics should the evaluation measure? 3. **RQ3**: How to prove the fairness of the evaluation process? To answer these questions, the authors introduced an open - source evaluation framework named LLMSecCode. This framework aims to objectively evaluate the capabilities of LLMs in secure coding and verify its effectiveness through experiments. ### Detailed Explanation #### Research Background With the rapid development of large language models (LLMs), their applications in the field of network security are also becoming more and more widespread. Especially in secure coding, LLMs have the potential to discover errors and propose security improvement measures. However, how to select a suitable LLM to support secure coding is a complex problem. For this reason, the authors proposed three key research questions and developed the LLMSecCode framework to solve these problems. #### Functions of the LLMSecCode Framework - **RQ1**: To simplify the evaluation of the secure coding capabilities of LLMs, the LLMSecCode framework has designed several key functions: - It supports adjusting model parameters (such as temperature and top - p) to observe the impact of different settings on performance. - It supports customizing prompts to adapt to different task requirements. - **RQ2**: The metrics that should be measured in the evaluation include: - Pass rate (@k), that is, the probability that at least one of the first k generated code samples passes the unit test. - Proportion of fault - free solutions (pass rate), that is, the ratio of the number of non - fault solutions to all evaluated solutions. - **RQ3**: To ensure the fairness of the evaluation process, the LLMSecCode framework has taken the following measures: - Use the same methods and tools for comparison. - Utilize a wide range of synthetic and real - world data sets. - Undergo community review through open - source development. #### Experimental Results The authors verified the effectiveness of the LLMSecCode framework through experiments. The experimental results show that under different parameters and prompts, the performance differences of LLMs are 10% and 9% respectively. In addition, compared with reliable external evaluations, the result difference of LLMSecCode is only 5%, indicating that its implementation is correct and reliable. #### Contributions The main contributions of the LLMSecCode framework include: - Provide a general open - source framework for evaluating the capabilities of LLMs in APR, CG, and SC. - Verify the effectiveness and fairness of the framework through experiments. - Provide a unified platform for model creators and users to evaluate and benchmark the secure coding capabilities of LLMs. In conclusion, this paper solves the important problem of how to select and evaluate LLMs suitable for secure coding by introducing the LLMSecCode framework, and provides new perspectives and tools for future research.