VulSeeker-pro: Enhanced Semantic Learning Based Binary Vulnerability Seeker with Emulation
Jian Gao,Xin Yang,Ying Fu,Yu Jiang,Heyuan Shi,Jiaguang Sun
DOI: https://doi.org/10.1145/3236024.3275524
2018-01-01
Abstract:Learning-based clone detection is widely exploited for binary vulnerability search. Although they solve the problem of high time overhead of traditional dynamic and static search approaches to some extent, their accuracy is limited, and need to manually identify the true positive cases among the top-M search results during the industrial practice. This paper presents VulSeeker-Pro, an enhanced binary vulnerability seeker that integrates function semantic emulation at the back end of semantic learning, to release the engineers from the manual identification work. It first uses the semantic learning based predictor to quickly predict the top-M candidate functions which are the most similar to the vulnerability from the target binary. Then the top-M candidates are fed to the emulation engine to resort, and more accurate top-N candidate functions are obtained. With fast filtering of semantic learning and dynamic trace generation of function semantic emulation, VulSeeker-Pro can achieve higher search accuracy with little time overhead. The experimental results on 15 known CVE vulnerabilities involving 6 industry widely used programs show that VulSeeker-Pro significantly outperforms the state-of-the-art approaches in terms of accuracy. In a total of 45 searches, VulSeeker-Pro finds 40 and 43 real vulnerabilities in the top-1 and top-5 candidate functions, which are 12.33× and 2.58× more than the most recent and related work Gemini. In terms of efficiency, it takes 0.22 seconds on average to determine whether the target binary function contains a known vulnerability or not.
What problem does this paper attempt to address?