Validating AI-Generated Code with Live Programming

Kasra Ferdowsi,Ruanqianqian Huang,Michael B. James,Nadia Polikarpova,Sorin Lerner
DOI: https://doi.org/10.1145/3613904.3642495
2024-02-24
Abstract:AI-powered programming assistants are increasingly gaining popularity, with GitHub Copilot alone used by over a million developers worldwide. These tools are far from perfect, however, producing code suggestions that may be incorrect in subtle ways. As a result, developers face a new challenge: validating AI's suggestions. This paper explores whether Live Programming (LP), a continuous display of a program's runtime values, can help address this challenge. To answer this question, we built a Python editor that combines an AI-powered programming assistant with an existing LP environment. Using this environment in a between-subjects study (N=17), we found that by lowering the cost of validation by execution, LP can mitigate over- and under-reliance on AI-generated programs and reduce the cognitive load of validation for certain types of tasks.
Human-Computer Interaction,Programming Languages
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper attempts to solve the problem of how to verify the validity and correctness of code generated by artificial intelligence (AI). Specifically, the paper explores whether **Live Programming (LP)** can help developers more effectively verify AI-generated code. #### Background and Motivation With the development of large language models, AI programming assistants such as GitHub Copilot, Amazon CodeWhisperer, and ChatGPT are becoming increasingly popular. While these tools can automate many traditional programming tasks, the code they generate may contain errors or not align with the developer's intent in some aspects. Therefore, developers face a new challenge: how to verify AI-generated code. #### Specific Manifestations of the Verification Problem 1. **Verification Bottleneck**: Research shows that verifying AI-generated code is one of the most common activities in AI-assisted programming. Many developers encounter difficulties when assessing the correctness of AI-generated code. 2. **Trust Issues**: Developers may lose trust in AI assistants due to the difficulty of verifying code, or they may blindly accept AI suggestions, thereby introducing errors and security vulnerabilities. #### Solution The paper proposes a method that combines live programming (LP) to help verify AI-generated code. Live programming continuously displays variable values during program execution, helping developers test and debug more frequently, thereby reducing the cost and cognitive load of verification. ### Experimental Design To test this hypothesis, the authors built a Python editor that integrates an AI programming assistant with an existing live programming environment. Through a controlled experiment (N=17), the effect of live programming on verifying AI-generated code was studied. ### Key Findings 1. **Reduced Over-reliance and Distrust**: Live programming reduces the cost of verification, thereby reducing developers' over-reliance on and distrust of AI-generated code. 2. **Reduced Cognitive Load**: For certain types of tasks, live programming significantly reduces the cognitive load of verifying AI-generated code. 3. **Improved Effectiveness of Verification Strategies**: Live programming enables developers to more effectively use runtime values for verification, especially in API-intensive tasks. ### Conclusion Live programming, as an auxiliary tool, can significantly improve developers' ability to verify AI-generated code, reduce cognitive load, and decrease over-reliance on or distrust of AI assistants. This provides important design references for future AI-assisted programming environments.