Kattis vs. ChatGPT: Assessment and Evaluation of Programming Tasks in the Age of Artificial Intelligence

Nora Dunder,Saga Lundborg,Olga Viberg,Jacqueline Wong
2023-12-02
Abstract:AI-powered education technologies can support students and teachers in computer science education. However, with the recent developments in generative AI, and especially the increasingly emerging popularity of ChatGPT, the effectiveness of using large language models for solving programming tasks has been underexplored. The present study examines ChatGPT's ability to generate code solutions at different difficulty levels for introductory programming courses. We conducted an experiment where ChatGPT was tested on 127 randomly selected programming problems provided by Kattis, an automatic software grading tool for computer science programs, often used in higher education. The results showed that ChatGPT independently could solve 19 out of 127 programming tasks generated and assessed by Kattis. Further, ChatGPT was found to be able to generate accurate code solutions for simple problems but encountered difficulties with more complex programming tasks. The results contribute to the ongoing debate on the utility of AI-powered tools in programming education.
Artificial Intelligence,Computers and Society,Software Engineering
What problem does this paper attempt to address?
The paper aims to explore the effectiveness of ChatGPT in programming education, particularly its ability to solve programming tasks. Specifically, the researchers evaluate ChatGPT through the following points: 1. **Research Background**: With the development of artificial intelligence technology, especially the emergence of generative AI tools like ChatGPT, their application in programming education has attracted widespread attention. These tools can automatically generate code solutions, but their accuracy and applicability have not been fully verified. 2. **Research Method**: The researchers selected 127 programming problems of varying difficulty levels and had ChatGPT attempt to solve these problems. These problems were provided by the Kattis system (an automatic code generation and evaluation tool). Kattis is widely used in computer science courses to support teachers and students. 3. **Main Findings**: - ChatGPT performed well in solving simple problems but encountered difficulties with complex tasks. - Out of the 127 programming tasks, ChatGPT successfully solved only 19 (about 15%), most of which were lower difficulty tasks. - For the unsuccessful solutions, the main error types included "wrong answers," "runtime errors," and "timeouts." 4. **Discussion and Conclusion**: - The research results indicate that although ChatGPT has some capability in certain simple programming tasks, it is currently insufficient to completely replace human roles in programming education. - Over-reliance on ChatGPT may hinder the development of students' programming skills, especially in introductory courses. - It is recommended that educators design relevant teaching activities to guide students in critically using AI tools like ChatGPT, enhancing their self-learning ability and critical thinking skills. 5. **Limitations**: - The limited number of test samples (127 problems) makes it difficult to generalize the research results. - Updates to the ChatGPT version may have affected the consistency of the test results. - The assumption that students would directly copy and paste the task descriptions provided by Kattis may not reflect the actual situation. In summary, this study reveals the potential advantages and limitations of ChatGPT in programming education and provides directions for future research.