Abstract:Large language models (LLMs) have demonstrated impressive capabilities across various natural language processing (NLP) tasks, such as machine translation, question answering, summarization, and so on. Additionally, LLMs are also highly valuable in supporting software engineering tasks, particularly in the field of code generation. Automatic code generation is a process of automatically generating source code or executable code based on given specifications or requirements, improving developer productivity. In this study, we perform a systematic empirical assessment to the quality of code generation using ChatGPT, a recent state-of-the-art product LLM. We leverage 728 algorithm problems in five languages (i.e., C, C++, Java, Python, and JavaScript) and 18 CWEs with 54 code scenarios for the code generation task. Our evaluation encompasses a comprehensive analysis of code snippets generated by ChatGPT, focusing on three critical aspects: correctness, complexity, and security. We also specifically investigate ChatGPT’s ability to engage in multi-round fixing process (i.e., ChatGPT’s dialog ability, chatting between users and ChatGPT for fixing generated buggy code) of facilitating code generation. By delving into the generated code and examining the experimental results, this work provides valuable insights into the performance of ChatGPT in tackling code generation tasks over the three critical aspects. The experimental results demonstrate that (1) ChatGPT is better at generating functionally correct code for problems before 2021 in different languages than problems after 2021 with 48.14% advantage in Accepted rate on judgment platform, but ChatGPT’s ability to directly fix erroneous code with multi-round fixing process to achieve correct functionality is relatively weak; (2) the distribution of cyclomatic and cognitive complexity levels for code snippets in different languages varies. Furthermore, the multi-round fixing process with ChatGPT generally preserves or increases the complexity levels of code snippets; (3) in algorithm scenarios with languages of C, C++, and Jave, and CWE scenarios with languages of C and Python3, the code generated by ChatGPT has relevant vulnerabilities. However, the multi-round fixing process for vulnerable code snippets demonstrates promising results, with more than 89% of vulnerabilities successfully addressed; and (4) code generation may be affected by ChatGPT’s non-determinism factor, resulting in variations of code snippets in functional correctness, complexity, and security. Overall, our findings uncover potential issues and limitations that arise in the ChatGPT-based code generation and lay the groundwork for improving AI and LLM-based code generation techniques.

ChatGPT-Generated Code Assignment Detection Using Perplexity of Large Language Models (Student Abstract)

ChatGPT Code Detection: Techniques for Uncovering the Source of Code

Zero-Shot Detection of Machine-Generated Codes

What You See Is Not Always What You Get: An Empirical Study of Code Comprehension by Large Language Models

Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers

Detecting LLM-Generated Text in Computing Education: A Comparative Study for ChatGPT Cases

Distinguishing LLM-generated from Human-written Code by Contrastive Learning

How Robust Is a Large Pre-trained Language Model for Code Generationƒ A Case on Attacking GPT2

Assessing AI Detectors in Identifying AI-Generated Code: Implications for Education

A Closer Look at Different Difficulty Levels Code Generation Abilities of ChatGPT.

No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT

How Secure is Code Generated by ChatGPT?

Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation

Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues

Fighting Fire with Fire: Can ChatGPT Detect AI-generated Text?

HowkGPT: Investigating the Detection of ChatGPT-generated University Student Homework through Context-Aware Perplexity Analysis

Beat LLMs at Their Own Game: Zero-Shot LLM-Generated Text Detection Via Querying ChatGPT.

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

Evade ChatGPT Detectors via A Single Space

Discriminating Human-authored from ChatGPT-Generated Code Via Discernable Feature Analysis