Abstract:Large Language Models (LLMs) now excel at generative skills and can create content at impeccable speeds. However, they are imperfect and still make various mistakes. In a Computer Science education context, as these models are widely recognized as "AI pair programmers," it becomes increasingly important to train students on evaluating and debugging the LLM-generated code. In this work, we introduce HypoCompass, a novel system to facilitate deliberate practice on debugging, where human novices play the role of Teaching Assistants and help LLM-powered teachable agents debug code. We enable effective task delegation between students and LLMs in this learning-by-teaching environment: students focus on hypothesizing the cause of code errors, while adjacent skills like code completion are offloaded to LLM-agents. Our evaluations demonstrate that HypoCompass generates high-quality training materials (e.g., bugs and fixes), outperforming human counterparts fourfold in efficiency, and significantly improves student performance on debugging by 12% in the pre-to-post test.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively teach programming in the era of artificial intelligence, especially how to improve students' debugging skills by using large - language models (LLMs) as teachable agents. Specifically, the author focuses on how to train students to evaluate and debug the code generated by LLMs in computer science education, as LLMs are widely recognized as "AI paired programmers". This involves several key points: 1. **Improving Debugging Skills**: The paper emphasizes the importance of debugging skills in programming teaching, especially in introductory - level computer science courses (such as CS1), where debugging skills are often overlooked. The author points out that students need to improve their ability to construct hypotheses (i.e., guessing the causes of code errors) through systematic practice, which is a core step in the debugging process. 2. **Taking Advantage of LLMs**: The paper proposes to utilize the capabilities of LLMs to generate high - quality debugging materials and designs a system named HypoCompass. This system can simulate the wrong code written by beginners and requires students to play the role of teaching assistants to help these simulated LLM agents debug the code. This method not only improves students' skills in hypothesis construction but also reduces the burden on teachers in preparing teaching materials. 3. **Promoting Effective Task Allocation**: In the HypoCompass system, students focus on constructing hypotheses about the causes of code errors, while other tasks not directly related to hypothesis construction (such as code completion) are left to the LLM agents. This way of task allocation helps students practice core skills more intensively. 4. **Evaluating the Effectiveness and Efficiency of the System**: The author verifies the effectiveness and efficiency of HypoCompass through two evaluation studies. The results show that HypoCompass is four times faster than humans in generating high - quality teaching materials, can significantly improve students' debugging scores (an increase of 12% from pre - test to post - test), and at the same time reduces the time for students to complete tasks (a reduction of 14%). In summary, this paper aims to propose a new method to improve students' programming debugging abilities, especially at the beginner stage, by combining the technological advantages of LLMs and educational practice.

How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging

Teach AI How to Code: Using Large Language Models as Teachable Agents for Programming Education

LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field Study

How Beginning Programmers and Code LLMs (Mis)read Each Other

Enhancing the Code Debugging Ability of LLMs via Communicative Agent Based Data Refinement

Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback

Navigating the Pitfalls: Analyzing the Behavior of LLMs as a Coding Assistant for Computer Science Students—A Systematic Review of the Literature

Can Language Models Employ the Socratic Method? Experiments with Code Debugging

DebugBench: Evaluating Debugging Capability of Large Language Models

Teaching Large Language Models to Self-Debug

CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs

A Unified Debugging Approach via LLM-Based Multi-Agent Synergy

Interactions with Prompt Problems: A New Way to Teach Programming with Large Language Models

Instruct, Not Assist: LLM-based Multi-Turn Planning and Hierarchical Questioning for Socratic Code Debugging

Evaluating the Impact of Advanced LLM Techniques on AI-Lecture Tutors for a Robotics Course

Exploring Interaction Patterns for Debugging: Enhancing Conversational Capabilities of AI-assistants

RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance

Let's Ask AI About Their Programs: Exploring ChatGPT's Answers To Program Comprehension Questions

AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails

Towards Applying Powerful Large AI Models in Classroom Teaching: Opportunities, Challenges and Prospects