Next-Step Hint Generation for Introductory Programming Using Large Language Models

Lianne Roest,Hieke Keuning,Johan Jeuring
2023-12-04
Abstract:Large Language Models possess skills such as answering questions, writing essays or solving programming exercises. Since these models are easily accessible, researchers have investigated their capabilities and risks for programming education. This work explores how LLMs can contribute to programming education by supporting students with automated next-step hints. We investigate prompt practices that lead to effective next-step hints and use these insights to build our StAP-tutor. We evaluate this tutor by conducting an experiment with students, and performing expert assessments. Our findings show that most LLM-generated feedback messages describe one specific next step and are personalised to the student's code and approach. However, the hints may contain misleading information and lack sufficient detail when students approach the end of the assignment. This work demonstrates the potential for LLM-generated feedback, but further research is required to explore its practical implementation.
Computers and Society,Artificial Intelligence,Human-Computer Interaction
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper explores how to use large language models (LLMs) to generate next-step hints in beginner programming exercises. Specifically, the main goals of the research are as follows: 1. **Generate Effective Next-Step Hints**: The researchers aim to explore how to design appropriate prompts so that LLMs can generate useful and specific next-step hints to help students progress in their programming exercises. 2. **Personalized Feedback**: The generated hints should be personalized based on the student's code and problem-solving approach to enhance their relevance and effectiveness. 3. **Evaluate the Quality of Hints**: Through experiments and expert evaluations, the researchers hope to verify whether the hints generated by LLMs indeed help students learn and identify potential issues. ### Background and Motivation - **Importance of Automated Feedback**: In programming education, automated feedback tools can help students correct errors, understand concepts, and provide guidance for problem-solving. - **Limitations of Existing Methods**: Traditional data-driven methods, while capable of generating next-step hints, often lack detailed explanations and may mislead students. Additionally, these methods usually require a large amount of historical data and manually defined knowledge components, which are time-consuming and complex. - **Advantages of LLMs**: Large language models have powerful text generation capabilities and can automatically generate personalized next-step hints without relying on historical data or model solutions. ### Research Questions - **RQ1**: To what extent can we use LLMs to generate informative and effective next-step hints for beginner Python programming exercises? - **SQ1**: What hint features are suitable for using LLMs to generate effective next-step hints? - **SQ2**: What are the views of students and experts on the quality of next-step hints generated by LLMs? ### Methods 1. **Dataset Creation**: The researchers used the dataset by Lyulina et al., which collected code sequences from beginners solving programming problems. 2. **Prompt Engineering**: Through an iterative process, the researchers designed different prompts to find the best format and attributes for the hints. 3. **Evaluation**: - **Student Experiment**: Three first-year AI students were recruited to use StAP-tutor (Step-Assisted Programming tutor) to complete programming tasks and rate the generated hints. - **Expert Evaluation**: Two experienced teaching assistants and instructors conducted a qualitative evaluation of the generated hints using nine evaluation criteria, including hint type, informativeness, detail, personalization, appropriateness, specificity, misleading information, tone, and length. ### Conclusion - **Main Findings**: Most of the hints generated by LLMs described a specific next step and were personalized to the student's code and problem-solving approach. However, when students were close to completing the task, the hints might contain misleading information or lack sufficient detail. - **Future Research Directions**: Although LLMs show potential in generating programming hints, further research is needed to explore their feasibility and effectiveness in practical applications, especially in preventing misleading information and improving hint detail. Through this study, the authors demonstrate the potential application of LLMs in programming education while also highlighting areas that need further improvement.