Abstract:Recent work in automated program repair (APR) proposes the use of reasoning and patch validation feedback to reduce the semantic gap between the LLMs and the code under analysis. The idea has been shown to perform well for general APR, but its effectiveness in other particular contexts remains underexplored. In this work, we assess the impact of reasoning and patch validation feedback to LLMs in the context of vulnerability repair, an important and challenging task in security. To support the evaluation, we present VRpilot, an LLM-based vulnerability repair technique based on reasoning and patch validation feedback. VRpilot (1) uses a chain-of-thought prompt to reason about a vulnerability prior to generating patch candidates and (2) iteratively refines prompts according to the output of external tools (e.g., compiler, code sanitizers, test suite, etc.) on previously-generated patches. To evaluate performance, we compare VRpilot against the state-of-the-art vulnerability repair techniques for C and Java using public datasets from the literature. Our results show that VRpilot generates, on average, 14% and 7.6% more correct patches than the baseline techniques on C and Java, respectively. We show, through an ablation study, that reasoning and patch validation feedback are critical. We report several lessons from this study and potential directions for advancing LLM-empowered vulnerability repair

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to use large - language models (LLMs) and methods based on reasoning and patch - verification feedback to improve the effectiveness of automated vulnerability repair, especially in applications in the security field**. ### Specific problem description 1. **Semantic gap problem**: Existing large - language models (LLMs) have deficiencies in understanding code semantics, especially when dealing with complex code - related tasks such as vulnerability repair. LLMs usually lack an understanding of specific code details, resulting in generated repair patches that may not meet expectations. 2. **Limitations of existing methods**: Although some research has proven the effectiveness of LLMs in general program repair, in specific security contexts (such as vulnerability repair), the effectiveness of these methods has not been fully verified. In particular, existing methods have limited ability to handle compilation errors, functional test failures, and security test failures. 3. **Improving repair effectiveness**: The paper proposes a new method - VRpilot, which aims to reduce the semantic gap between LLMs and the code to be analyzed by introducing reasoning and patch - verification feedback mechanisms, thereby improving the success rate and accuracy of vulnerability repair. ### Main contributions of the paper 1. **Proposing the VRpilot tool**: This tool is based on reasoning and feedback mechanisms and uses LLMs for vulnerability repair. It enhances the understanding ability of LLMs through chain - of - thought prompts and iteratively optimizes the generated patches through feedback from external tools (such as compilers, test suites, etc.). 2. **Evaluating performance**: The paper experimentally compares VRpilot with existing state - of - the - art vulnerability repair techniques (such as CodexVR) and demonstrates the superior performance of VRpilot on C and Java datasets. The results show that VRpilot can generate more correct patches on average, which are 14% (for the C language) and 7.6% (for the Java language) higher than the baseline techniques respectively. 3. **Ablation study**: Through ablation study, the paper verifies the importance of reasoning and feedback mechanisms to the performance of VRpilot. The research shows that these two components have a significant impact on the proportion of reasonable patches generated, and the lack of either one will lead to a significant decline in performance. ### Conclusion By introducing reasoning and patch - verification feedback mechanisms, the paper successfully improves the performance of large - language models in vulnerability repair tasks. This method not only helps to improve the success rate of repair but also can better deal with complex security problems. Future research can further explore how to incorporate more domain knowledge into LLMs to enhance their application effectiveness in automated vulnerability repair.

A Case Study of LLM for Automated Vulnerability Repair: Assessing Impact of Reasoning and Patch Validation Feedback

The Future Can’t Help Fix the Past: Assessing Program Repair in the Wild

ThinkRepair: Self-Directed Automated Program Repair

Automated Software Vulnerability Patching using Large Language Models

Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis

Revisiting the Plastic Surgery Hypothesis via Large Language Models

The Right Prompts for the Job: Repair Code-Review Defects with Large Language Model

RePair: Automated Program Repair with Process-based Feedback

Exploring and Lifting the Robustness of LLM-powered Automated Program Repair with Metamorphic Testing

How Effective Are Neural Networks for Fixing Security Vulnerabilities

Enhanced Automated Code Vulnerability Repair using Large Language Models

Patch Space Exploration using Static Analysis Feedback

A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models

LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward

Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models

Conversational Automated Program Repair

How Far Can We Go with Practical Function-Level Program Repair?

A Systematic Literature Review on Large Language Models for Automated Program Repair

High-Quality Automated Program Repair

VulnLLMEval: A Framework for Evaluating Large Language Models in Software Vulnerability Detection and Patching

A Survey of Learning-based Automated Program Repair