How the Training Procedure Impacts the Performance of Deep Learning-based Vulnerability Patching

Antonio Mastropaolo,Vittoria Nardone,Gabriele Bavota,Massimiliano Di Penta
2024-04-27
Abstract:Generative deep learning (DL) models have been successfully adopted for vulnerability patching. However, such models require the availability of a large dataset of patches to learn from. To overcome this issue, researchers have proposed to start from models pre-trained with general knowledge, either on the programming language or on similar tasks such as bug fixing. Despite the efforts in the area of automated vulnerability patching, there is a lack of systematic studies on how these different training procedures impact the performance of DL models for such a task. This paper provides a manyfold contribution to bridge this gap, by (i) comparing existing solutions of self-supervised and supervised pre-training for vulnerability patching; and (ii) for the first time, experimenting with different kinds of prompt-tuning for this task. The study required to train/test 23 DL models. We found that a supervised pre-training focused on bug-fixing, while expensive in terms of data collection, substantially improves DL-based vulnerability patching. When applying prompt-tuning on top of this supervised pre-trained model, there is no significant gain in performance. Instead, prompt-tuning is an effective and cheap solution to substantially boost the performance of self-supervised pre-trained models, i.e., those not relying on the bug-fixing pre-training.
Software Engineering
What problem does this paper attempt to address?
The paper discusses the issue of using deep learning for software vulnerability repair. In the study, the authors compared self-supervised and supervised pre-training methods, and also explored different types of prompt fine-tuning for this task. They trained and tested 23 deep learning models and found that supervised pre-training focused on error repair, while expensive in data collection, can significantly improve the performance of deep learning-based vulnerability repair. Prompt fine-tuning had a significant effect on performance improvement for self-supervised pre-training models, but it did not provide significant help for models that had already undergone supervised pre-training. The main contributions of the paper include: 1. Comparing the application of self-supervised and supervised pre-training in vulnerability repair. 2. Experimenting with various prompt fine-tuning methods for the vulnerability repair task. 3. Discovering that duplicate instances in the dataset may exaggerate performance and conducting data cleaning. The research found that although self-supervised pre-training can reduce the requirement for a large amount of data, it is always beneficial for vulnerability repair in the limited fine-tuning data (real-world vulnerability patches). In particular, supervised pre-training with a focus on error repair is superior to self-supervised pre-training. In addition, prompt fine-tuning has a significant performance improvement effect on self-supervised pre-training models that do not rely on error repair pre-training. The related work section mentions other machine learning-based vulnerability repair methods, as well as recent prompt fine-tuning techniques applied in software engineering tasks. Through a series of experiments, the paper aims to fill the gap in a systematic study of the impact of different training strategies on vulnerability repair performance.