Identifying Affected Libraries and Their Ecosystems for Open Source Software Vulnerabilities

Susheng Wu,Wenyan Song,Kaifeng Huang,Bihuan Chen,Xin Peng
DOI: https://doi.org/10.1145/3597503.3639582
2024-01-01
Abstract:Software composition analysis (SCA) tools have been widely adopted to identify vulnerable libraries used in software applications. Such SCA tools depend on a vulnerability database to know affected libraries of each vulnerability. However, it is labor-intensive and error prone for a security team to manually maintain the vulnerability database. While several approaches adopt extreme multi-label learning to predict affected libraries for vulnerabilities, they are practically ineffective due to the limited library labels and the unawareness of ecosystems. To address these problems, we first conduct an empirical study to assess the quality of two fields, i.e., affected libraries and their ecosystems, for four vulnerability databases. Our study reveals notable inconsistency and inaccuracy in these two fields. Then, we propose H OLMES to identify affected libraries and their ecosystems for vulnerabilities via a learning-to-rank technique. The key idea of H OLMES is to gather various evidences about affected libraries and their ecosystems from multiple sources, and learn to rank a pool of libraries based on their relevance to evidences. Our extensive experiments have shown the effectiveness, efficiency and usefulness of H OLMES .
What problem does this paper attempt to address?