Cross-Architecture Binary Semantics Understanding Via Similar Code Comparison.

Yikun Hu,Yuanyuan Zhang,Juanru Li,Dawu Gu
DOI: https://doi.org/10.1109/saner.2016.50
2016-01-01
Abstract:With the prevailing of smart devices (e.g., smart phone, routers, cameras), more and more programs are ported from traditional desktop platform to embedded hardware with ARM or MIPS architecture. While the compiled binary code differs significantly due to the variety of CPU architectures, these ported programs share the same code base of the desktop version. Thus it is feasible to utilize the program of commodity computer to help understand those cross-compiled binaries and locate functions with similar semantics. However, as instruction sets of different architectures are generally incomparable, it is difficult to conduct a static cross-architecture binary code similarity comparison. To address, we propose a semantic-based approach to fulfill this target. We dynamically extract the signature, which is composed of conditional operations behaviors as well as system call information, from binaries on different platforms with the same manner. Then the similarity of signatures is measured to help identify functions in ported programs. We have implemented the approach in MOCKINGBIRD, an automated analysis tool to compare code similarity between binaries across architectures. MOCKINGBIRD supports mainstream architectures and is able to analyze ELF executables on Linux platform. We have evaluated MOCKINGBIRD with a set of popular programs with cross-compiled versions. The results show our approach is not only effective for dealing with this new issue of cross-architecture binary code comparison, but also improves the accuracy of similarity based function identification due to the utilization of semantic information.
What problem does this paper attempt to address?