PatUntrack: Automated Generating Patch Examples for Issue Reports without Tracked Insecure Code

Ziyou Jiang,Lin Shi,Guowei Yang,Qing Wang
2024-08-16
Abstract:Security patches are essential for enhancing the stability and robustness of projects in the software community. While vulnerabilities are officially expected to be patched before being disclosed, patching vulnerabilities is complicated and remains a struggle for many organizations. To patch vulnerabilities, security practitioners typically track vulnerable issue reports (IRs), and analyze their relevant insecure code to generate potential patches. However, the relevant insecure code may not be explicitly specified and practitioners cannot track the insecure code in the repositories, thus limiting their ability to generate patches. In such cases, providing examples of insecure code and the corresponding patches would benefit the security developers to better locate and fix the insecure code. In this paper, we propose PatUntrack to automatically generating patch examples from IRs without tracked insecure code. It auto-prompts Large Language Models (LLMs) to make them applicable to analyze the vulnerabilities. It first generates the completed description of the Vulnerability-Triggering Path (VTP) from vulnerable IRs. Then, it corrects hallucinations in the VTP description with external golden knowledge. Finally, it generates Top-K pairs of Insecure Code and Patch Example based on the corrected VTP description. To evaluate the performance, we conducted experiments on 5,465 vulnerable IRs. The experimental results show that PatUntrack can obtain the highest performance and improve the traditional LLM baselines by +14.6% (Fix@10) on average in patch example generation. Furthermore, PatUntrack was applied to generate patch examples for 76 newly disclosed vulnerable IRs. 27 out of 37 replies from the authors of these IRs confirmed the usefulness of the patch examples generated by PatUntrack, indicating that they can benefit from these examples for patching the vulnerabilities.
Cryptography and Security,Artificial Intelligence,Software Engineering
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve a key problem encountered in the process of generating security patches in the open - source software (OSS) community: **How to automatically generate vulnerability - fixing examples without tracing the insecure code**. Specifically, when developers report vulnerabilities, they do not clearly point out the relevant insecure code, making it difficult for security experts to accurately generate fixing patches. This not only increases the difficulty of fixing vulnerabilities but also may lead to the vulnerabilities being exploited by attackers, bringing risks to the system. #### Background and Challenges 1. **Importance of Vulnerability Fixing** - Security patches are crucial for improving the stability and robustness of projects. - According to the CERT guidelines, vulnerabilities should be fixed before public disclosure, but in practice, many organizations face difficulties in fixing vulnerabilities. 2. **Limitations of Existing Methods** - Developers usually report vulnerabilities through issue reports (IRs), and security experts need to analyze these reports and track the relevant code to generate patches. - However, in many cases, IRs do not clearly indicate the location of the insecure code, resulting in the inability to effectively generate patches. 3. **Risks of Vulnerability Exploitation** - According to research, 69.0% of vulnerability reports fail to trace the insecure code, and 71.7% of these vulnerabilities are successfully exploited, indicating the urgency of fixing vulnerabilities. #### Solutions To solve the above problems, the authors propose an automated method named **PatUntrack** to generate patch examples from issue reports without tracing the insecure code. The main contributions of PatUntrack include: 1. **Technology** - A method for automated patch generation is proposed, which does not rely on the source code for guidance. This is the first attempt to generate patch examples without tracing the insecure code. 2. **Evaluation** - Through experimental evaluation, PatUntrack significantly outperforms traditional large - scale language model (LLM) baseline methods in the patch - generation task, improving by 17.7% (MatchFix) and 14.6% (Fix@10) respectively. - Manual evaluation of newly disclosed vulnerability reports further verifies its effectiveness in practical applications. 3. **Data** - The relevant data sets and source code are made public, so that other researchers can reproduce the experimental results and apply PatUntrack in a broader context. #### Method Overview The workflow of PatUntrack mainly includes three steps: 1. **Generate a complete Vulnerability Trigger Path (VTP) description** - Extract the original VTP description from the issue report and supplement the missing operation nodes and edges. 2. **Correct potential hallucinations** - Use external gold - standard knowledge bases (such as SARD, OWASP, etc.) to correct errors or misleading information in the VTP description. 3. **Generate insecure code and patch examples** - Based on the corrected VTP description, predict the patch type and generate Top - 𝐾 pairs of insecure code and patch examples. Through this method, PatUntrack aims to help security experts generate patches more quickly and accurately, thereby fixing vulnerabilities in a timely manner and reducing the risk of the system being attacked. ### Summary PatUntrack is an innovative method that can automatically generate effective patch examples without clearly tracing the insecure code, thereby improving the efficiency and accuracy of vulnerability fixing. This method is of great significance for enhancing the security of open - source software.