Automating Zero-Shot Patch Porting for Hard Forks

Shengyi Pan,You Wang,Zhongxin Liu,Xing Hu,Xin Xia,Shanping Li
2024-04-28
Abstract:Forking is a typical way of code reuse, which provides a simple way for developers to create a variant software (denoted as hard fork) by copying and modifying an existing codebase. Despite of the benefits, forking also leads to duplicate efforts in software maintenance. Developers need to port patches across the hard forks to address similar bugs or implement similar features. Due to the divergence between the source project and the hard fork, patch porting is complicated, which requires an adaption regarding different implementations of the same functionality. In this work, we take the first step to automate patch porting for hard forks under a zero-shot setting. We first conduct an empirical study of the patches ported from Vim to Neovim over the last ten years to investigate the necessities of patch porting and the potential flaws in the current practice. We then propose a large language model (LLM) based approach (namely PPatHF) to automatically port patches for hard forks on a function-wise basis. Specifically, PPatHF is composed of a reduction module and a porting module. Given the pre- and post-patch versions of a function from the reference project and the corresponding function from the target project, the reduction module first slims the input functions by removing code snippets less relevant to the patch. Then, the porting module leverages a LLM to apply the patch to the function from the target project. We evaluate PPatHF on 310 Neovim patches ported from Vim. The experimental results show that PPatHF outperforms the baselines significantly. Specifically, PPatHF can correctly port 131 (42.3%) patches and automate 57% of the manual edits required for the developer to port the patch.
Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is zero - shot patch porting in automated hard forks. Specifically, the authors focus on how to automatically port patches from the source project to the hard - fork project without historical patch - porting data. The following is a detailed description of this problem: ### Problem Background Hard fork refers to the method of creating a variant software by copying and modifying the existing codebase. Although this method provides a simple and flexible way to create customized software versions, it also leads to duplicate work in software maintenance. Developers need to port patches between multiple hard forks to fix similar vulnerabilities or implement similar functions. Due to the differences between the source project and the hard fork, patch porting becomes complicated and usually needs to be adjusted according to different implementations. ### Main Challenges 1. **Semantic Understanding and Implementation Differences**: - Although the hard fork shares a similar design logic with the source project, over time, their specific implementations will gradually become different. - Automated solutions need to correctly understand the semantics of the patch in order to maintain the consistency of the modification logic while considering the implementation differences. 2. **Automation in Zero - Sample Settings**: - In zero - sample settings, only the patch information of the source project is available, without any additional historical patch - porting or test case information. - This makes most existing methods unable to be directly applied to the hard - fork patch - porting task. ### Solution To solve the above problems, the authors propose a large - language - model - (LLM - ) based method PPatHF (Patch Porting for Hard Forks). This method includes two modules: the **Reduction Module** and the **Porting Module**. - **Reduction Module**: By removing code fragments unrelated to the patch, the length of the input function is reduced, so that the LLM can handle the patch - porting task more efficiently. - **Porting Module**: Use the LLM to apply the patch in the source project to the hard - fork project and ensure the semantic consistency of the patch. ### Experimental Results The experimental results show that PPatHF is significantly superior to the baseline method in multiple metrics, specifically: - PPatHF can correctly port 42.3% of the patches. - On average, it reduces the amount of editing required for developers to manually port patches by 57%. ### Summary This paper aims to reduce the manual workload of developers and improve the porting efficiency by automating zero - shot patch porting in hard forks, thereby reducing the security risks caused by delayed porting.