FastFixer: An Efficient and Effective Approach for Repairing Programming Assignments

Fang Liu,Zhenwei Liu,Qianhui Zhao,Jing Jiang,Li Zhang,Ge Li,Zian Sun,Zhongqi Li,Yuchi Ma
2024-10-11
Abstract:Providing personalized and timely feedback for student's programming assignments is useful for programming education. Automated program repair (APR) techniques have been used to fix the bugs in programming assignments, where the Large Language Models (LLMs) based approaches have shown promising results. Given the growing complexity of identifying and fixing bugs in advanced programming assignments, current fine-tuning strategies for APR are inadequate in guiding the LLM to identify bugs and make accurate edits during the generative repair process. Furthermore, the autoregressive decoding approach employed by the LLM could potentially impede the efficiency of the repair, thereby hindering the ability to provide timely feedback. To tackle these challenges, we propose FastFixer, an efficient and effective approach for programming assignment repair. To assist the LLM in accurately identifying and repairing bugs, we first propose a novel repair-oriented fine-tuning strategy, aiming to enhance the LLM's attention towards learning how to generate the necessary patch and its associated context. Furthermore, to speed up the patch generation, we propose an inference acceleration approach that is specifically tailored for the program repair task. The evaluation results demonstrate that FastFixer obtains an overall improvement of 20.46% in assignment fixing when compared to the state-of-the-art baseline. Considering the repair efficiency, FastFixer achieves a remarkable inference speedup of 16.67 times compared to the autoregressive decoding algorithm.
Computers and Society,Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to provide personalized and timely feedback for programming assignments, especially in complex advanced programming assignments, and the challenges faced by Automatic Program Repair (APR) techniques in identifying and fixing errors. Specifically, the existing methods have the following problems: 1. **Insufficient fine - tuning strategies**: The existing fine - tuning strategies are not sufficient to guide large - language models (LLMs) to accurately identify and fix errors in programming assignments. 2. **Low efficiency of autoregressive decoding**: The autoregressive decoding method adopted by LLMs is inefficient, which affects the repair speed and thus hinders the provision of timely feedback. To address these challenges, the authors propose a new method named FastFixer, aiming to improve the efficiency and effectiveness of programming assignment repair. The following are the main contributions of the paper: - **Proposing a new repair - oriented fine - tuning strategy**: By enhancing the LLMs' attention to generating necessary patches and their related contexts, it helps LLMs to more effectively identify and fix errors. - **Proposing an inference acceleration algorithm**: Specifically designed for program repair tasks, it uses defective code as a draft to accelerate the inference process. This is the first exploration of inference acceleration in LLM - based APR methods. - **Conducting a comprehensive evaluation**: In terms of repairing defective programs in advanced programming assignments, FastFixer performs excellently. It correctly repairs 312 programs on the Defects4DS dataset, with a 20.46% higher repair rate compared to the best - existing method, and the inference speed is increased by 16.67 times. ### Formula Summary 1. **Similarity Calculation Formula**: \[ \text{sim}(e, m)=\frac{1}{1 + \log(\text{dist}(e, m))}+1 \] where \(e\in Y_e\), \(m\in Y_m\), and \(\text{dist}\) is the Levenshtein distance between two statements. 2. **Weight Calculation Formula**: \[ \text{weight}(e)=\max\left(1,\sum_{m\in Y_m}\text{sim}(e, m)\right) \] 3. **Repair - Oriented Fine - Tuning Loss Function**: \[ L_{\text{ROFT}}=\sum_{i = 1}^{n}L_{\text{origin}}(q_i|X, q_1, q_2,\ldots, q_{i - 1})\cdot k_i \] where \(L_{\text{origin}}\) is the original cross - entropy loss, and \(k_i\) is the weight of each target statement \(q_i\) obtained from the modification - focused mask vector. Through these improvements, FastFixer not only improves the repair accuracy but also significantly enhances the repair speed, making it possible to provide timely feedback for complex programming assignments.