Abstract:Aim/Purpose: This paper is part of a multi-case study that aims to test whether generative AI makes an effective coding assistant. Particularly, this work evaluates the ability of two AI chatbots (ChatGPT and Bing Chat) to generate concise computer code, considers ethical issues related to generative AI, and offers suggestions for how to improve the technology. Background: Since the release of ChatGPT in 2022, generative artificial intelligence has steadily gained wide use in software development. However, there is conflicting information on the extent to which AI helps developers be more productive in the long term. Also, whether using generated code violates copyright restrictions is a matter of debate. Methodology: ChatGPT and Bing Chat were asked the same question, their responses were recorded, and the percentage of each chatbot’s code that was extraneous was calculated. Also examined were qualitative factors, such as how often the generated code required modifications before it would run. Contribution: This paper adds to the limited body of research on how effective generative AI is at aiding software developers and how to practically address its shortcomings. Findings: Results of AI testing observed that 0.7% of lines and 1.4% of characters in ChatGPT’s responses were extraneous, while 0.7% of lines and 1.1% of characters in Bing Chat’s responses were extraneous. This was well below the 2% threshold, meaning both chatbots can generate concise code. However, code from both chatbots frequently had to be modified before it would work; ChatGPT’s code needed major modifications 30% of the time and minor ones 50% of the time, while Bing Chat’s code needed major modifications 10% of the time and minor ones 70% of the time. Recommendations for Practitioners: Companies building generative AI solutions are encouraged to use this study’s findings to improve their models, specifically by decreasing error rates, adding more training data for programming languages with less public documentation, and implementing a mechanism that checks code for syntactical errors. Developers can use the findings to increase their productivity, learning how to reap generative AI’s full potential while being aware of its limitations. Recommendation for Researchers: Researchers are encouraged to continue where this paper left off, exploring more programming languages and prompting styles than the scope of this study allowed. Impact on Society: As artificial intelligence touches more areas of society than ever, it is crucial to make AI models as accurate and dependable as possible. If practitioners and researchers use the findings of this paper to improve coders’ experience with generative AI, it will make millions of developers more productive, saving their companies money and time. Future Research: The results of this study can be strengthened (or refuted) by a future study with a large, diverse dataset that more fully represents the programming languages and prompting styles developers tend to use. Moreover, further research can examine the reasons generative AI fails to deliver working code, which will yield valuable insights into improving these models.

Programming with AI: Evaluating ChatGPT, Gemini, AlphaCode, and GitHub Copilot for Programmers

Evaluation of the Programming Skills of Large Language Models

Benchmarking ChatGPT, Codeium, and GitHub Copilot: A Comparative Study of AI-Driven Programming and Debugging Assistants

"Will I be replaced?" Assessing ChatGPT's effect on software development and programmer perceptions of AI tools

Effectiveness of ChatGPT in Coding: A Comparative Analysis of Popular Large Language Models

Comparison of Large Language Models in Generating Machine Learning Curricula in High Schools

A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages

A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges

Students' Perceptions and Preferences of Generative Artificial Intelligence Feedback for Programming

Understanding the Usability of AI Programming Assistants

Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors

Is ChatGPT the Ultimate Programming Assistant -- How far is it?

Unmasking the giant: A comprehensive evaluation of ChatGPT's proficiency in coding algorithms and data structures

LLM4DS: Evaluating Large Language Models for Data Science Code Generation

Is English the New Programming Language? How About Pseudo-code Engineering?

Coding with AI as an Assistant: Can AI Generate Concise Computer Code?

Should AI Optimize Your Code? A Comparative Study of Current Large Language Models Versus Classical Optimizing Compilers

Significant Productivity Gains through Programming with Large Language Models

Let's Ask AI About Their Programs: Exploring ChatGPT's Answers To Program Comprehension Questions

Grounded Copilot: How Programmers Interact with Code-Generating Models

Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review