There are More Fish in the Sea: Automated Vulnerability Repair via Binary Templates

Bo Lin,Shangwen Wang,Liqian Chen,Xiaoguang Mao
2024-11-27
Abstract:As software vulnerabilities increase in both volume and complexity, vendors often struggle to repair them promptly. Automated vulnerability repair has emerged as a promising solution to reduce the burden of manual debugging and fixing activities. However, existing techniques exclusively focus on repairing the vulnerabilities at the source code level, which has various limitations. For example, they are not applicable to those (e.g., users or security analysts) who do not have access to the source code. Consequently, this restricts the practical application of these techniques, especially in cases where vendors are unable to provide timely patches. In this paper, we aim to address the above limitations by performing vulnerability repair at binary code level, and accordingly propose a template-based automated vulnerability repair approach for Java binaries. Built on top of the literature, we collect fix templates from both existing template-based automated program repair approaches and vulnerability-specific analyses, which are then implemented for the Java binaries. Our systematic application of these templates effectively mitigates vulnerabilities: experiments on the Vul4J dataset demonstrate that TemVUR successfully repairs 11 vulnerabilities, marking a notable 57.1% improvement over current repair techniques. Moreover, TemVUR securely fixes 66.7% more vulnerabilities compared to leading techniques (15 vs. 9), underscoring its effectiveness in mitigating the risks posed by these vulnerabilities. To assess the generalizability of TemVUR, we curate the ManyVuls4J dataset, which goes beyond Vul4J to encompass a wider diversity of vulnerabilities. With 30% more vulnerabilities than its predecessor (increasing from 79 to 103). The evaluation on ManyVuls4J reaffirms TemVUR's effectiveness and generalizability across a diverse set of real-world vulnerabilities.
Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: as the number and complexity of software vulnerabilities keep increasing, the existing automated vulnerability repair techniques mainly focus on the source - code level, while ignoring the binary - code - level repair. This has led to the following two main problems: 1. **Source code unavailable**: Many users or security analysts do not have the permission to access the source code, especially in commercial off - the - shelf software, where manufacturers usually encrypt or obfuscate their software. Therefore, these users cannot use the existing source - code - based automated repair tools to repair the vulnerabilities they encounter. 2. **Low efficiency**: The existing vulnerability repair techniques need to compile and load each candidate patch when generating candidate patches, and this process is very time - consuming. For example, compiling and verifying patches may account for 92.8% of the total execution time of the automated program repair tool. For manufacturers that need to handle tens of thousands of vulnerabilities every year, this process is both time - consuming and expensive. To solve these problems, the author proposes a new automated vulnerability repair method - TemVUR, which focuses on the vulnerability repair at the Java bytecode (i.e., binary file) level. Through this method, users can repair vulnerabilities in a timely manner without the source code, and can skip the time - consuming compilation process, thereby improving the repair efficiency. ### The main contributions of TemVUR include: 1. **New dimension**: It is the first attempt to prevent vulnerabilities from being exploited at the binary level, getting rid of the dependence on the source code and expanding the application range. 2. **State - of - the - art AVR tool**: A template - based AVR tool TemVUR is proposed, achieving state - of - the - art performance in vulnerability repair. 3. **New data set**: A new data set named ManyVuls4J is created, which is 30% larger than the existing largest data set, and all vulnerabilities are from real - world instances and manually verified. 4. **Available resources**: The reproduction package, the generated patches and the data set used in the experiment are released to support the reproduction of the community and future research. Through these improvements, TemVUR not only improves the effectiveness and security of vulnerability repair, but also shows the potential and advantages of binary - level repair.