CATI: Context-Assisted Type Inference from Stripped Binaries

Ligeng Chen,Zhongling He,Bing Mao
DOI: https://doi.org/10.1109/dsn48063.2020.00028
2020-01-01
Abstract:Code analysis is a powerful way to eliminate vulnerabilities. Closed-source programs lack crucial information vital for code analysis because that information is stripped on compilation to achieve smaller executable size. Restoration has always been a challenge for experts. Variable type information is fundamental in this process because it helps to provide a perspective on program semantic. In this paper, we present an efficient approach for inferring types, and we overcome the challenge of scattered information provided by static analysis on stripped binaries. We discover that neighboring instructions are likely to operate the same type of variables, which are leveraged to enrich the features that we rely on. Therefore, we implement a system called CATI, which locates variables from stripped binaries and infers 19 types from variables. Experiments show that it infers variable type with 71.2% accuracy on unseen binaries. Meanwhile, it takes approximately 6 seconds to process a typical binary.
What problem does this paper attempt to address?