Similarity of Binaries Across Optimization Levels and Obfuscation

Jianguo Jiang,Gengwang Li,Min Yu,Gang Li,Chao Liu,Zhiqiang Lv,Bin Lv,Weiqing Huang
DOI: https://doi.org/10.1007/978-3-030-58951-6_15
2020-01-01
Abstract:Binary code similarity evaluation has been widely applied in security. Unfortunately, the compiler optimization and obfuscation techniques exert challenges that have not been well addressed by existing approaches. In this paper, we propose a prototype, ImOpt, for re-optimizing code to boost similarity evaluation. The key contribution is an immediate SSA (static single-assignment) transforming algorithm to provide a very fast pointer analysis for re-optimizing more thoroughly. The algorithm transforms variables and even pointers into SSA form on the fly, so that the information on def-use and reachability can be maintained promptly. By utilizing the immediate SSA transforming algorithm, ImOpt canonicalizes and eliminates junk code to alleviate the perturbation from optimization and obfuscation. We illustrate that ImOpt can improve the accuracy of a state-of-the-art approach on similarity evaluation by 22.7%. Our experiment results demonstrate that the bottleneck part of our SSA transforming algorithm runs 15.7x faster than one of the best similar methods. Furthermore, we show that ImOpt is robust to many obfuscation techniques that based on data dependency.
What problem does this paper attempt to address?