How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging

Qianou Ma,Hua Shen,Kenneth Koedinger,Tongshuang Wu
DOI: https://doi.org/10.1007/978-3-031-64302-6_19
2024-10-11
Abstract:Large Language Models (LLMs) now excel at generative skills and can create content at impeccable speeds. However, they are imperfect and still make various mistakes. In a Computer Science education context, as these models are widely recognized as "AI pair programmers," it becomes increasingly important to train students on evaluating and debugging the LLM-generated code. In this work, we introduce HypoCompass, a novel system to facilitate deliberate practice on debugging, where human novices play the role of Teaching Assistants and help LLM-powered teachable agents debug code. We enable effective task delegation between students and LLMs in this learning-by-teaching environment: students focus on hypothesizing the cause of code errors, while adjacent skills like code completion are offloaded to LLM-agents. Our evaluations demonstrate that HypoCompass generates high-quality training materials (e.g., bugs and fixes), outperforming human counterparts fourfold in efficiency, and significantly improves student performance on debugging by 12% in the pre-to-post test.
Human-Computer Interaction,Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively teach programming in the era of artificial intelligence, especially how to improve students' debugging skills by using large - language models (LLMs) as teachable agents. Specifically, the author focuses on how to train students to evaluate and debug the code generated by LLMs in computer science education, as LLMs are widely recognized as "AI paired programmers". This involves several key points: 1. **Improving Debugging Skills**: The paper emphasizes the importance of debugging skills in programming teaching, especially in introductory - level computer science courses (such as CS1), where debugging skills are often overlooked. The author points out that students need to improve their ability to construct hypotheses (i.e., guessing the causes of code errors) through systematic practice, which is a core step in the debugging process. 2. **Taking Advantage of LLMs**: The paper proposes to utilize the capabilities of LLMs to generate high - quality debugging materials and designs a system named HypoCompass. This system can simulate the wrong code written by beginners and requires students to play the role of teaching assistants to help these simulated LLM agents debug the code. This method not only improves students' skills in hypothesis construction but also reduces the burden on teachers in preparing teaching materials. 3. **Promoting Effective Task Allocation**: In the HypoCompass system, students focus on constructing hypotheses about the causes of code errors, while other tasks not directly related to hypothesis construction (such as code completion) are left to the LLM agents. This way of task allocation helps students practice core skills more intensively. 4. **Evaluating the Effectiveness and Efficiency of the System**: The author verifies the effectiveness and efficiency of HypoCompass through two evaluation studies. The results show that HypoCompass is four times faster than humans in generating high - quality teaching materials, can significantly improve students' debugging scores (an increase of 12% from pre - test to post - test), and at the same time reduces the time for students to complete tasks (a reduction of 14%). In summary, this paper aims to propose a new method to improve students' programming debugging abilities, especially at the beginner stage, by combining the technological advantages of LLMs and educational practice.