Abstract:Large Language Models possess skills such as answering questions, writing essays or solving programming exercises. Since these models are easily accessible, researchers have investigated their capabilities and risks for programming education. This work explores how LLMs can contribute to programming education by supporting students with automated next-step hints. We investigate prompt practices that lead to effective next-step hints and use these insights to build our StAP-tutor. We evaluate this tutor by conducting an experiment with students, and performing expert assessments. Our findings show that most LLM-generated feedback messages describe one specific next step and are personalised to the student's code and approach. However, the hints may contain misleading information and lack sufficient detail when students approach the end of the assignment. This work demonstrates the potential for LLM-generated feedback, but further research is required to explore its practical implementation.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper explores how to use large language models (LLMs) to generate next-step hints in beginner programming exercises. Specifically, the main goals of the research are as follows: 1. **Generate Effective Next-Step Hints**: The researchers aim to explore how to design appropriate prompts so that LLMs can generate useful and specific next-step hints to help students progress in their programming exercises. 2. **Personalized Feedback**: The generated hints should be personalized based on the student's code and problem-solving approach to enhance their relevance and effectiveness. 3. **Evaluate the Quality of Hints**: Through experiments and expert evaluations, the researchers hope to verify whether the hints generated by LLMs indeed help students learn and identify potential issues. ### Background and Motivation - **Importance of Automated Feedback**: In programming education, automated feedback tools can help students correct errors, understand concepts, and provide guidance for problem-solving. - **Limitations of Existing Methods**: Traditional data-driven methods, while capable of generating next-step hints, often lack detailed explanations and may mislead students. Additionally, these methods usually require a large amount of historical data and manually defined knowledge components, which are time-consuming and complex. - **Advantages of LLMs**: Large language models have powerful text generation capabilities and can automatically generate personalized next-step hints without relying on historical data or model solutions. ### Research Questions - **RQ1**: To what extent can we use LLMs to generate informative and effective next-step hints for beginner Python programming exercises? - **SQ1**: What hint features are suitable for using LLMs to generate effective next-step hints? - **SQ2**: What are the views of students and experts on the quality of next-step hints generated by LLMs? ### Methods 1. **Dataset Creation**: The researchers used the dataset by Lyulina et al., which collected code sequences from beginners solving programming problems. 2. **Prompt Engineering**: Through an iterative process, the researchers designed different prompts to find the best format and attributes for the hints. 3. **Evaluation**: - **Student Experiment**: Three first-year AI students were recruited to use StAP-tutor (Step-Assisted Programming tutor) to complete programming tasks and rate the generated hints. - **Expert Evaluation**: Two experienced teaching assistants and instructors conducted a qualitative evaluation of the generated hints using nine evaluation criteria, including hint type, informativeness, detail, personalization, appropriateness, specificity, misleading information, tone, and length. ### Conclusion - **Main Findings**: Most of the hints generated by LLMs described a specific next step and were personalized to the student's code and problem-solving approach. However, when students were close to completing the task, the hints might contain misleading information or lack sufficient detail. - **Future Research Directions**: Although LLMs show potential in generating programming hints, further research is needed to explore their feasibility and effectiveness in practical applications, especially in preventing misleading information and improving hint detail. Through this study, the authors demonstrate the potential application of LLMs in programming education while also highlighting areas that need further improvement.

Next-Step Hint Generation for Introductory Programming Using Large Language Models

Howzat? Appealing to Expert Judgement for Evaluating Human and AI Next-Step Hints for Novice Programmers

Exploring How Multiple Levels of GPT-Generated Programming Hints Support or Disappoint Novices

One Step at a Time: Combining LLMs and Static Analysis to Generate Next-Step Hints for Programming Tasks

Exploring the Potential of Large Language Models to Generate Formative Programming Feedback

Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology

Generating Feedback-Ladders for Logical Errors in Programming using Large Language Models

Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation

Interactions with Prompt Problems: A New Way to Teach Programming with Large Language Models

Large Language Models (GPT) for automating feedback on programming assignments

AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails

Exploring the Responses of Large Language Models to Beginner Programmers' Help Requests

Towards the Pedagogical Steering of Large Language Models for Tutoring: A Case Study with Modeling Productive Failure

Using Large Language Models to Provide Explanatory Feedback to Human Tutors

Enhancing LLM-Based Feedback: Insights from Intelligent Tutoring Systems and the Learning Sciences

Forgetful Large Language Models: Lessons Learned from Using LLMs in Robot Programming

Evaluating Language Models for Generating and Judging Programming Feedback

Evaluating the Application of Large Language Models to Generate Feedback in Programming Education

Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions?

Teach AI How to Code: Using Large Language Models as Teachable Agents for Programming Education

Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field Study