AI-assisted coding: Experiments with GPT-4

Russell A Poldrack,Thomas Lu,Gašper Beguš
2023-04-26
Abstract:Artificial intelligence (AI) tools based on large language models have acheived human-level performance on some computer programming tasks. We report several experiments using GPT-4 to generate computer code. These experiments demonstrate that AI code generation using the current generation of tools, while powerful, requires substantial human validation to ensure accurate performance. We also demonstrate that GPT-4 refactoring of existing code can significantly improve that code along several established metrics for code quality, and we show that GPT-4 can generate tests with substantial coverage, but that many of the tests fail when applied to the associated code. These findings suggest that while AI coding tools are very powerful, they still require humans in the loop to ensure validity and accuracy of the results.
Artificial Intelligence,Software Engineering
What problem does this paper attempt to address?
The paper aims to explore and evaluate the performance and auxiliary role of GPT-4 in programming tasks. Specifically, the researchers demonstrate GPT-4's performance in generating code, refactoring existing code, and automatically generating test cases through a series of experiments. Although GPT-4 exhibits strong capabilities in these tasks, the study finds that human involvement is still necessary to ensure the validity and accuracy of the code. Additionally, the paper points out that GPT-4 has limitations in handling certain mathematical processes and implementing complex concepts in specific domains, requiring further debugging and verification by human experts. Overall, the study emphasizes that even the most advanced AI systems require human supervision and intervention in programming tasks.