Pre-Patch: Find Hidden Threats In Open Software Based On Machine Learning Method

Mutian Yang,Jingzheng Wu,Shouling Ji,Tianyue Luo,Yanjun Wu
DOI: https://doi.org/10.1007/978-3-319-94472-2_4
2018-01-01
Abstract:The details of vulnerabilities are always kept confidential until fixed, which is an efficient way to avoid the exploitations and attacks. However, the Security Related Commits (SRCs), used to fix the vulnerabilities in open source software, usually lack proper protections. Most SRCs are released in code repositories such as Git, Github, Source-forge, etc. earlier than the corresponding vulnerabilities published. These commits often previously disclose the vital information which can be used by the attackers to locate and exploit the vulnerable code. Therefore, we defined the pre-leaked SRC as the Pre-Patch problem and studied its hidden threats to the open source software. In this paper, we presented an Automatic Security Related Commits Detector (ASRCD) to rapidly identify the Pre-Patch problems from the numerous commits in code repositories by learning the features of SRCs. We implemented ASRCD and evaluated it with 78,218 real-world commits collected from Linux Kernel, OpenSSL, phpMyadmin and Mantisbt released between 2016 to 2017, which contain 227 confirmed SRCs. ASRCD successfully identified 206 SRCs from the 4 projects, including 140 known SRCs (recall rate: 61.7% on average) and 66 new high-suspicious. In addition, 5 of the SRCs have been published after our prediction. The results show that: (1) the Pre-Patch is really a hidden threat to open source software; and (2) the proposed ASRCD is effective in identifying such SRCs. Finally, we recommended the identified SRCs should be fixed as soon as possible.
What problem does this paper attempt to address?