Abstract:Aim/Purpose: This paper is part of a multi-case study that aims to test whether generative AI makes an effective coding assistant. Particularly, this work evaluates the ability of two AI chatbots (ChatGPT and Bing Chat) to generate concise computer code, considers ethical issues related to generative AI, and offers suggestions for how to improve the technology. Background: Since the release of ChatGPT in 2022, generative artificial intelligence has steadily gained wide use in software development. However, there is conflicting information on the extent to which AI helps developers be more productive in the long term. Also, whether using generated code violates copyright restrictions is a matter of debate. Methodology: ChatGPT and Bing Chat were asked the same question, their responses were recorded, and the percentage of each chatbot’s code that was extraneous was calculated. Also examined were qualitative factors, such as how often the generated code required modifications before it would run. Contribution: This paper adds to the limited body of research on how effective generative AI is at aiding software developers and how to practically address its shortcomings. Findings: Results of AI testing observed that 0.7% of lines and 1.4% of characters in ChatGPT’s responses were extraneous, while 0.7% of lines and 1.1% of characters in Bing Chat’s responses were extraneous. This was well below the 2% threshold, meaning both chatbots can generate concise code. However, code from both chatbots frequently had to be modified before it would work; ChatGPT’s code needed major modifications 30% of the time and minor ones 50% of the time, while Bing Chat’s code needed major modifications 10% of the time and minor ones 70% of the time. Recommendations for Practitioners: Companies building generative AI solutions are encouraged to use this study’s findings to improve their models, specifically by decreasing error rates, adding more training data for programming languages with less public documentation, and implementing a mechanism that checks code for syntactical errors. Developers can use the findings to increase their productivity, learning how to reap generative AI’s full potential while being aware of its limitations. Recommendation for Researchers: Researchers are encouraged to continue where this paper left off, exploring more programming languages and prompting styles than the scope of this study allowed. Impact on Society: As artificial intelligence touches more areas of society than ever, it is crucial to make AI models as accurate and dependable as possible. If practitioners and researchers use the findings of this paper to improve coders’ experience with generative AI, it will make millions of developers more productive, saving their companies money and time. Future Research: The results of this study can be strengthened (or refuted) by a future study with a large, diverse dataset that more fully represents the programming languages and prompting styles developers tend to use. Moreover, further research can examine the reasons generative AI fails to deliver working code, which will yield valuable insights into improving these models.

Validating AI-Generated Code with Live Programming

Programming with AI: Evaluating ChatGPT, Gemini, AlphaCode, and GitHub Copilot for Programmers

Grounded Copilot: How Programmers Interact with Code-Generating Models

Exploring the Problems, their Causes and Solutions of AI Pair Programming: A Study on GitHub and Stack Overflow

How far are AI-powered programming assistants from meeting developers' needs?

Understanding the Usability of AI Programming Assistants

A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges

Pair Programming With Generative AI

Coding with AI as an Assistant: Can AI Generate Concise Computer Code?

Enhancing Programming Error Messages in Real Time with Generative AI

"It's Weird That it Knows What I Want": Usability and Interactions with Copilot for Novice Programmers

AI-assisted Code Authoring at Scale: Fine-tuning, deploying, and mixed methods evaluation

From Copilot to Pilot: Towards AI Supported Software Development

Assessing Learning of Computer Programming Skills in the Age of Generative Artificial Intelligence.

GPTutor: an open-source AI pair programming tool alternative to Copilot

Does Co-Development with AI Assistants Lead to More Maintainable Code? A Registered Report

Exploring the Design Space of Cognitive Engagement Techniques with AI-Generated Code for Enhanced Learning

ProgAI: Enhancing Code Generation with LLMs For Real World Challenges

Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming

Practices and Challenges of Using GitHub Copilot: An Empirical Study

CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs