Abstract:Aim/Purpose: This paper is part of a multi-case study that aims to test whether generative AI makes an effective coding assistant. Particularly, this work evaluates the ability of two AI chatbots (ChatGPT and Bing Chat) to generate concise computer code, considers ethical issues related to generative AI, and offers suggestions for how to improve the technology. Background: Since the release of ChatGPT in 2022, generative artificial intelligence has steadily gained wide use in software development. However, there is conflicting information on the extent to which AI helps developers be more productive in the long term. Also, whether using generated code violates copyright restrictions is a matter of debate. Methodology: ChatGPT and Bing Chat were asked the same question, their responses were recorded, and the percentage of each chatbot’s code that was extraneous was calculated. Also examined were qualitative factors, such as how often the generated code required modifications before it would run. Contribution: This paper adds to the limited body of research on how effective generative AI is at aiding software developers and how to practically address its shortcomings. Findings: Results of AI testing observed that 0.7% of lines and 1.4% of characters in ChatGPT’s responses were extraneous, while 0.7% of lines and 1.1% of characters in Bing Chat’s responses were extraneous. This was well below the 2% threshold, meaning both chatbots can generate concise code. However, code from both chatbots frequently had to be modified before it would work; ChatGPT’s code needed major modifications 30% of the time and minor ones 50% of the time, while Bing Chat’s code needed major modifications 10% of the time and minor ones 70% of the time. Recommendations for Practitioners: Companies building generative AI solutions are encouraged to use this study’s findings to improve their models, specifically by decreasing error rates, adding more training data for programming languages with less public documentation, and implementing a mechanism that checks code for syntactical errors. Developers can use the findings to increase their productivity, learning how to reap generative AI’s full potential while being aware of its limitations. Recommendation for Researchers: Researchers are encouraged to continue where this paper left off, exploring more programming languages and prompting styles than the scope of this study allowed. Impact on Society: As artificial intelligence touches more areas of society than ever, it is crucial to make AI models as accurate and dependable as possible. If practitioners and researchers use the findings of this paper to improve coders’ experience with generative AI, it will make millions of developers more productive, saving their companies money and time. Future Research: The results of this study can be strengthened (or refuted) by a future study with a large, diverse dataset that more fully represents the programming languages and prompting styles developers tend to use. Moreover, further research can examine the reasons generative AI fails to deliver working code, which will yield valuable insights into improving these models.

AI-assisted coding: Experiments with GPT-4

OpenAi's GPT4 as coding assistant

Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses

Advancing GenAI Assisted Programming--A Comparative Study on Prompt Efficiency and Code Quality Between GPT-4 and GLM-4

Comparing large language models and human programmers for generating programming code

Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust

Significant Productivity Gains through Programming with Large Language Models

AI-enhanced Auto-correction of Programming Exercises: How Effective is GPT-3.5?

Kattis vs. ChatGPT: Assessment and Evaluation of Programming Tasks in the Age of Artificial Intelligence

Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors

Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code

Coding with AI as an Assistant: Can AI Generate Concise Computer Code?

Code Generation Using Self-Interactive Assistant

Sparks of Artificial General Intelligence: Early experiments with GPT-4

A comparison of human, GPT-3.5, and GPT-4 performance in a university-level coding course

Enhancing Programming Error Messages in Real Time with Generative AI

A Preliminary Analysis on the Code Generation Capabilities of GPT-3.5 and Bard AI Models for Java Functions

No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT

"Give me the code" -- Log Analysis of First-Year CS Students' Interactions With GPT