Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories

Daan Hommersom,Antonino Sabetta,Bonaventura Coppola,Dario Di Nucci,Damian A. Tamburri
DOI: https://doi.org/10.1145/3649590
IF: 3.685
2024-03-04
ACM Transactions on Software Engineering and Methodology
Abstract:The lack of comprehensive sources of accurate vulnerability data represents a critical obstacle to studying and understanding software vulnerabilities (and their corrections). In this paper, we present an approach that combines heuristics stemming from practical experience and machine-learning (ML)—specifically, natural language processing (NLP)—to address this problem. Our method consists of three phases. First, we construct an advisory record object containing key information about a vulnerability that is extracted from an advisory, such those found in the National Vulnerability Database (NVD). These advisories are expressed in natural language. Second, using heuristics, a subset of candidate fix commits is obtained from the source code repository of the affected project, by filtering out commits that can be identified as unrelated to the vulnerability at hand. Finally, for each of the remaining candidate commits, our method builds a numerical feature vector reflecting the characteristics of the commit that are relevant to predicting its match with the advisory at hand. Based on the values of these feature vectors, our method produces a ranked list of candidate fixing commits. The score attributed by the ML model to each feature is kept visible to the users, allowing them to easily interpret the predictions. We implemented our approach and we evaluated it on an open data set, built by manual curation, that comprises 2,391 known fix commits corresponding to 1,248 public vulnerability advisories. When considering the top-10 commits in the ranked results, our implementation could successfully identify at least one fix commit for up to 84.03% of the vulnerabilities (with a fix commit on the first position for 65.06% of the vulnerabilities). Our evaluation shows that our method can reduce considerably the manual effort needed to search OSS repositories for the commits that fix known vulnerabilities.
computer science, software engineering
What problem does this paper attempt to address?