Divide-and-Conquer: Automating Code Revisions Via Localization-and-Revision
Shangwen Wang,Bo Lin,Liqian Chen,Xiaoguang Mao
DOI: https://doi.org/10.1145/3697013
IF: 3.685
2024-01-01
ACM Transactions on Software Engineering and Methodology
Abstract:Despite its effectiveness in ensuring software quality, code review remains a labor-intensive and time-consuming task. In order to alleviate this burden on developers, researchers have proposed the automation of code review activities, particularly focusing on automating code revisions. This automation can benefit both code authors, as they are relieved from the manual task of code revision, and code reviewers, as they are spared from addressing minor code flaws through manual comments. While current code revision approaches have shown promising results, they typically operate within a single phase, in which the code requiring revision is treated as the input of a deep learning model, and the revised code is directly generated through a sequence-to-sequence transformation. Consequently, these approaches tackle both the challenges of localization (i.e., where to revise) and revision (i.e., how to revise) simultaneously. Attempting to handle the entire complex process with a single model goes against the principle of “Divide-and-Conquer”, which encourages breaking down complex problems into smaller sub-problems and addressing them individually. In fact, we have observed that existing code revision approaches often yield inaccurate results in both the localization and revision phases. In this paper, we present a two-phase code revision approach that aims to overcome the aforementioned limitations by adhering to the “Divide-and-Conquer” principle. Our approach comprises two key components: a localizer, responsible for identifying the specific parts of the input code that require revisions, and a reviser, tasked with generating the revised code based on the localization result. Extensive experiments conducted on two widely-used datasets demonstrate the substantial superiority of our approach over existing code revision approaches. For instance, when revising code based on the code reviewer’s comments, our approach achieves a success rate of over 20% in implementing the ground-truth code revisions. In comparison, the widely-used pre-trained model CodeT5 achieves a success rate of less than 16% on the same test set, which contains 16K+ cases.