BEDetector: A Two-Channel Encoding Method to Detect Vulnerabilities Based on Binary Similarity

Lu Yu,Yuliang Lu,Yi Shen,Hui Huang,Kailong Zhu
DOI: https://doi.org/10.1109/access.2021.3064687
IF: 3.9
2021-01-01
IEEE Access
Abstract:Applying neural network technology to binary similarity detection has become a promising search topic, and vulnerability detection is an important application field of binary similarity detection. When embedding binary code into matrix by neural network, the problem of feature representation also needs to be solved in vulnerability detection. However, most of the current researches extract the syntax or structural features of binary code, and take basic block as the minimum analysis unit, which is relatively coarse. In addition, the structural features of binary functions are usually represented by the dependency graph. In the embedding process, only the neighbour information of the node can be obtained, ignoring the global information of the graph. To solve these two problems, we propose a two-channel feature extraction method to obtain semantic feature in finer granularity and represent the structural features globally instead of locally. Inspired by natural language process, we propose a contextual semantic feature extraction method to obtain different granularity features of binary functions. It takes instruction as the minimum analysis unit and obtains the semantic relationship between instructions. Meanwhile, in order to represent the structural feature of each function, we propose a neural GAE model instead of the widely used structure2vec model. In this way, we can preserve and reconstruct the control dependencies between the basic blocks in the whole graph. We have implemented a prototype system BEDetector, evaluated the effectiveness of its neural model and compared the accuracy of vulnerability function detection with state-of-the-art system. Besides, we choose the real-world firmware files as the detection target and prove that BEDetector can achieve a relatively high detection rate. BEDetector could reach a precision of 88.8%, 86.7% and 100% when ranking top-50 candidate functions in the detectio- of the CVE vulnerability function $ssl3_{}get_{}key_{}exchange$ , $ssl3_{}get_{}new_{}session_{}ticket$ and ${udhcp_{}get_{}option}$ , proving the efficiency of our method.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?