Improving Code Refinement for Code Review Via Input Reconstruction and Ensemble Learning

Jiawei Lu,Zhijie Tang,Zhongxin Liu
DOI: https://doi.org/10.1109/apsec60848.2023.00026
2023-01-01
Abstract:Code review is crucial for ensuring the quality of source code in software development. Automating the code review process is essential to save time and reduce costs, as manually reviewing code can be time-consuming and challenging for developers. Code refinement, an important task for automating code review, aims to automatically modify the code under review to address reviewers' comments. Previous research has fine-tuned pre-trained models like CodeT5 and CodeReviewer for code refinement, showing promising results. However, fine-tuning these models can make them forget the knowledge learned during pre-training and lead to suboptimal performance. To overcome this challenge, we employ an information retrieval method to enable the model to recall its learned knowledge. Furthermore, we propose using prompt templates to reconstruct the input and align the formats of the input data used during fine-tuning and pre-training, thus alleviating knowledge forgetting. Multiple models are created using the retrieval reconstruction and prompt reconstruction methods mentioned above, which are highly complementary. An ensemble learning method is employed to identify the most promising output from the outputs of these models. Our ensemble model achieves an Exact Match (EM) score of 36.32, surpassing the state-of-the-art CodeReviewer model by 19.3% and the popular GPT-3.5-Turbo model by 49.6%.
What problem does this paper attempt to address?