Abstract:Generative pre-trained transformers (GPT's) are a type of large language machine learning model that are unusually adept at producing novel, and coherent, natural language. In this study the ability of GPT models to generate novel and correct versions, and notably very insecure versions, of implementations of the cryptographic hash function SHA-1 is examined. The GPT models Llama-2-70b-chat-h, Mistral-7B-Instruct-v0.1, and zephyr-7b-alpha are used. The GPT models are prompted to re-write each function using a modified version of the localGPT framework and langchain to provide word embedding context of the full source code and header files to the model, resulting in over 150,000 function re-write GPT output text blocks, approximately 50,000 of which were able to be parsed as C code and subsequently compiled. The generated code is analyzed for being compilable, correctness of the algorithm, memory leaks, compiler optimization stability, and character distance to the reference implementation. Remarkably, several generated function variants have a high implementation security risk of being correct for some test vectors, but incorrect for other test vectors. Additionally, many function implementations were not correct to the reference algorithm of SHA-1, but produced hashes that have some of the basic characteristics of hash functions. Many of the function re-writes contained serious flaws such as memory leaks, integer overflows, out of bounds accesses, use of uninitialised values, and compiler optimization instability. Compiler optimization settings and SHA-256 hash checksums of the compiled binaries are used to cluster implementations that are equivalent but may not have identical syntax - using this clustering over 100,000 novel and correct versions of the SHA-1 codebase were generated where each component C function of the reference implementation is different from the original code.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is to evaluate the capabilities and risks of the Generative Pretrained Transformer (GPT) model in generating correct and secure implementations of cryptographic hash functions. Specifically, the researchers focus on using the GPT model to rewrite the C - language code of the SHA - 1 cryptographic hash function and analyze the correctness, security, and potential risks of these generated codes. ### Research Background and Problem Description 1. **Characteristics of the GPT Model**: - The GPT model is a type of large - language model that is good at generating natural - language text and has been successfully extended to the field of computer programming languages. - However, the output of the GPT model is random and not always correct, especially in programming languages, where the syntax and algorithms of the code must be strictly accurate to ensure the security of the computing system and the application. 2. **Research Motivation**: - Although using the GPT model to generate computer code has innovative potential, it also brings important security risks. For example, the generated code may contain serious vulnerabilities such as memory leaks, integer overflows, and out - of - bounds access. - This research aims to explore whether the GPT model can generate a correct and secure implementation of the SHA - 1 cryptographic hash function and at the same time reveal its potential security risks. ### Specific Research Contents 1. **Experimental Design**: - Use three GPT models (Llama - 2 - 70b - chat - hf, Mistral - 7B - Instruct - v0.1, zephyr - 7b - alpha) to rewrite different component functions of the SHA - 1 algorithm. - Each model generates a large number of code variants through different prompts, with a total of more than 150,000 code blocks, of which about 50,000 can be parsed into C code and compiled. 2. **Code Evaluation**: - The generated code is tested by compilation to check whether it can be compiled, whether the algorithm is correct, whether there are memory leaks, compiler optimization stability, etc. - Analyze the character distance between the generated code and the reference implementation to evaluate their similarity and difference. 3. **Problems Found**: - Many of the generated code variants are correct on some test vectors but incorrect on others. - Some code variants have serious security vulnerabilities, such as memory leaks, integer overflows, and the use of uninitialized values. - Changes in compiler optimization settings can lead to unstable behavior of the generated code. ### Conclusion This research reveals the serious security risks that may be introduced when using the GPT model to generate or rewrite source code. Although the GPT model shows a certain innovative ability in generating code, the code it generates is not necessarily completely correct or secure, especially when dealing with cryptographic algorithms that require highly accurate implementation. Therefore, the research recommends cautious use of the GPT model for actual code generation, especially in safety - critical applications. ### Related Formulas No complex mathematical formulas are involved in the paper, but technical details such as compiler optimization levels and SHA - 256 hash verification are involved. For example: - **Compiler Optimization Levels**: `-O0`, `-O1`, `-O2`, `-O3`, `-Ofast`, `-Os` - **SHA - 256 Hash Verification**: Used to verify whether the compiled binary file is equivalent. ```markdown \text{SHA - 256}(binary\_file) = hash\_value ``` These technical details ensure the diversity of the generated code and the rigor of the correctness evaluation.

Automated Creation of Source Code Variants of a Cryptographic Hash Function Implementation Using Generative Pre-Trained Transformer Models

Automated Software Vulnerability Static Code Analysis Using Generative Pre-Trained Transformer Models

How secure is AI-generated Code: A Large-Scale Comparison of Large Language Models

How Secure is Code Generated by ChatGPT?

Exploiting Novel GPT-4 APIs

How Well Do Large Language Models Serve as End-to-End Secure Code Producers?

"Moralized" Multi-Step Jailbreak Prompts: Black-Box Testing of Guardrails in Large Language Models for Verbal Attacks

The Counterfeit Conundrum: Can Code Language Models Grasp the Nuances of Their Incorrect Generations?

No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT

How Robust Is a Large Pre-trained Language Model for Code Generationƒ A Case on Attacking GPT2

Targeted Attack of Deep Hashing via Prototype-supervised Adversarial Networks

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Ocassionally Secure: A Comparative Analysis of Code Generation Assistants

Unveiling the potential of large language models in generating semantic and cross-language clones

Comparing Robustness Against Adversarial Attacks in Code Generation: LLM-Generated vs. Human-Written

GPT-Driven Source-to-Source Transformation for Generating Compilable Parallel CUDA Code for Nussinov's Algorithm

An Exploratory Study on Fine-Tuning Large Language Models for Secure Code Generation

ChatGPT-Generated Code Assignment Detection Using Perplexity of Large Language Models (Student Abstract)

Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust

Uncovering Weaknesses in Neural Code Generation