1-To-1 or 1-to-n? Investigating the Effect of Function Inlining on Binary Similarity Analysis

Ang Jia,Ming Fan,Wuxia Jin,Xi Xu,Zhaohui Zhou,Qiyi Tang,Sen Nie,Shi Wu,Ting Liu
DOI: https://doi.org/10.1145/3561385
IF: 3.685
2023-01-01
ACM Transactions on Software Engineering and Methodology
Abstract:Binary similarity analysis is critical to many code-reuse-related issues, where function matching is its fundamental task. “ 1-to-1 ” mechanism has been applied in most binary similarity analysis works, in which one function in a binary file is matched against one function in a source file or binary file. However, we discover that the function mapping is a more complex problem of “ 1-to-n ” (one binary function matches multiple source functions or binary functions) or even “ n-to-n ” (multiple binary functions match multiple binary functions) due to the existence of function inlining , different from traditional understanding. In this article, we investigate the effect of function inlining on binary similarity analysis. We carry out three studies to investigate the extent of function inlining, the performance of existing works under function inlining, and the effectiveness of existing inlining-simulation strategies. Firstly, a scalable and lightweight identification method is designed to recover function inlining in binaries. 88 projects (compiled in 288 versions and resulting in 32,460,156 binary functions) are collected and analyzed to construct four inlining-oriented datasets for four security tasks in the software supply chain, including code search, OSS (Open Source Software) reuse detection, vulnerability detection, and patch presence test. Datasets reveal that the proportion of function inlining ranges from 30–40% when using O3 and sometimes can reach nearly 70%. Then, we evaluate four existing works on our dataset. Results show most existing works neglect inlining and use the “1-to-1” mechanism. The mismatches cause a 30% loss in performance during code search and a 40% loss during vulnerability detection. Moreover, most inlined functions would be ignored during OSS reuse detection and patch presence test, thus leaving these functions risky. Finally, we analyze two inlining-simulation strategies on our dataset. It is shown that they miss nearly 40% of the inlined functions, and there is still a large space for promotion. By precisely recovering when function inlining happens, we discover that inlining is usually cumulative when optimization increases. Thus, conditional inlining and incremental inlining are recommended to design a low-cost and high-coverage inlining-simulation strategy.
What problem does this paper attempt to address?