Semantic Learning and Emulation Based Cross-Platform Binary Vulnerability Seeker
Jian Gao,Yu Jiang,Zhe Liu,Xin Yang,Cong Wang,Xun Jiao,Zijiang Yang,Jiaguang Sun
DOI: https://doi.org/10.1109/tse.2019.2956932
IF: 7.4
2021-11-01
IEEE Transactions on Software Engineering
Abstract:Clone detection is widely exploited for software vulnerability search. The approaches based on source code analysis cannot be applied to binary clone detection because the same source code can produce significantly different binaries due to different operating systems, microprocessor architectures and compilers. In this paper, we present BinSeeker, a cross-platform binary seeker that integrates semantic learning and emulation. With the help of the labeled semantic flow graph, BinSeeker can quickly identify <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.442ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 1051.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-4D" x="0" y="0"></use></g></svg></span>M candidate functions that are most similar to the vulnerability from the target binary. The value of <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.442ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 1051.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-4D" x="0" y="0"></use></g></svg></span>M is relatively large so this semantic learning procedure essentially eliminates those functions that are very unlikely to have the vulnerability. Then, semantic emulation is conducted on these <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.442ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 1051.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-4D" x="0" y="0"></use></g></svg></span>M candidates to obtain their dynamic signature sequences. By comparing signature sequences, BinSeeker produces top-<span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="2.064ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 888.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-4E" x="0" y="0"></use></g></svg></span>N functions that exhibit most similar behavior to that of the vulnerability. With fast filtering of semantic learning and accurate comparison of semantic emulation, BinSeeker seeks vulnerability precisely with little overhead. The experimen-s on six widely used programs with fifteen known CVE vulnerabilities demonstrate that BinSeeker outperforms three state-of-the-art tools Genius, Gemini and CACompare. Regarding search accuracy, BinSeeker achieves an MRR value of 0.65 in the target programs, whereas the MRR values by Genius, Gemini and CACompare are 0.17, 0.07 and 0.42, respectively. If we consider ranking a function with the targeted vulnerability in the top-5 as accurate, BinSeeker achieves the accuracy of 93.33 percent, while the accuracy of the other three tools is merely 33.33, 13.33 and 53.33 percent, respectively. Such accuracy is achieved with 0.27s on average to determine whether the target binary function contains a known vulnerability, and the time for the other three tools are 1.57s, 0.15s and 0.98s, respectively. Compared to the time used to manually identify the true positive vulnerability from the false positive candidates reported by Gemini, the time overhead of BinSeeker is negligible. Evidently, the proposed BinSeeker achieves a better balance between accuracy and efficiency.<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMATHI-4D" d="M289 629Q289 635 232 637Q208 637 201 638T194 648Q194 649 196 659Q197 662 198 666T199 671T201 676T203 679T207 681T212 683T220 683T232 684Q238 684 262 684T307 683Q386 683 398 683T414 678Q415 674 451 396L487 117L510 154Q534 190 574 254T662 394Q837 673 839 675Q840 676 842 678T846 681L852 683H948Q965 683 988 683T1017 684Q1051 684 1051 673Q1051 668 1048 656T1045 643Q1041 637 1008 637Q968 636 957 634T939 623Q936 618 867 340T797 59Q797 55 798 54T805 50T822 48T855 46H886Q892 37 892 35Q892 19 885 5Q880 0 869 0Q864 0 828 1T736 2Q675 2 644 2T609 1Q592 1 592 11Q592 13 594 25Q598 41 602 43T625 46Q652 46 685 49Q699 52 704 61Q706 65 742 207T813 490T848 631L654 322Q458 10 453 5Q451 4 449 3Q444 0 433 0Q418 0 415 7Q413 11 374 317L335 624L267 354Q200 88 200 79Q206 46 272 46H282Q288 41 289 37T286 19Q282 3 278 1Q274 0 267 0Q265 0 255 0T221 1T157 2Q127 2 95 1T58 0Q43 0 39 2T35 11Q35 13 38 25T43 40Q45 46 65 46Q135 46 154 86Q158 92 223 354T289 629Z"></path><path stroke-width="1" id="MJMATHI-4E" d="M234 637Q231 637 226 637Q201 637 196 638T191 649Q191 676 202 682Q204 683 299 683Q376 683 387 683T401 677Q612 181 616 168L670 381Q723 592 723 606Q723 633 659 637Q635 637 635 648Q635 650 637 660Q641 676 643 679T653 683Q656 683 684 682T767 680Q817 680 843 681T873 682Q888 682 888 672Q888 650 880 642Q878 637 858 637Q787 633 769 597L620 7Q618 0 599 0Q585 0 582 2Q579 5 453 305L326 604L261 344Q196 88 196 79Q201 46 268 46H278Q284 41 284 38T282 19Q278 6 272 0H259Q228 2 151 2Q123 2 100 2T63 2T46 1Q31 1 31 10Q31 14 34 26T39 40Q41 46 62 46Q130 49 150 85Q154 91 221 362L289 634Q287 635 234 637Z"></path></defs></svg>
engineering, electrical & electronic,computer science, software engineering