ErrorCLR: Semantic Error Classification, Localization and Repair for Introductory Programming Assignments

Siqi Han,Yu Wang,Xuesong Lu
DOI: https://doi.org/10.1145/3539618.3591680
2023-01-01
Abstract:Programming education at scale increasingly relies on automated feedback to help students learn to program. An important form of feedback is to point out semantic errors in student programs and provide hints for program repair. Such automated feedback depends essentially on solving the tasks of classification, localization and repair of semantic errors. Although there are datasets for the tasks, we observe that they do not have the annotations supporting all three tasks. As such, existing approaches for semantic error feedback treat error classification, localization and repair as independent tasks, resulting in sub-optimal performance on each task. Moreover, existing datasets either contain few programming assignments or have few programs for each assignment. Therefore, existing approaches often leverage rule-based methods and evaluate them with a small number of programming assignments. To tackle the problems, we first describe the creation of a new dataset COJ2022 that contains 5,914 C programs with semantic errors submitted to 498 different assignments in an introductory programming course, where each program is annotated with the error types and locations and is coupled with the repaired program submitted by the same student. We show the advantages of COJ2022 over existing datasets on various aspects. Second, we treat semantic error classification, localization and repair as dependent tasks, and propose a novel two-stage method ErrorCLR to solve them. Specifically, in the first stage we train a model based on graph matching networks to jointly classify and localize potential semantic errors in student programs, and in the second stage we mask error spans in buggy programs using information of error types and locations and train a CodeT5 model to predict correct spans. The predicted spans replace the error spans to form repaired programs. Experimental results show that ErrorCLR remarkably outperforms the comparative methods for all three tasks on COJ2022 and other public datasets. We also conduct a case study to visualize and interpret what is learned by the graph matching network in ErrorCLR. We have released the source code and COJ2022 at https://github.com/DaSESmartEdu/ErrorCLR.
What problem does this paper attempt to address?