BCFinder: A Lightweight and Platform-Independent Tool to Find Third-Party Components in Binaries

Wei Tang,Du Chen,Ping Luo
DOI: https://doi.org/10.1109/apsec.2018.00043
2018-01-01
Abstract:Open source movement boosts several open source communities and millions of open source repositories (repos) are available on these communities. Consequently, component-based development and code reuse greatly improve the efficiency of software development. However, they can also bring some problems, such as license violation and security weaknesses. While code reuse detection has been extensively studied in source form, third-party components detection for software in binary form especially based on large scale database like Github has been less researched. In this paper, we take a series of data cleaning processes to get filtered 22K C/C++ repos on Github. We extend the code reuse detection for binaries against such a large-scale data set and design a system called BCFinder as an assistant tool for binary analysis. BCFinder finds third-party components in binaries automatically by feature matching. We evaluate BCFinder with a number of real-word binary programs across platform and compiling configurations. Experiments show that BCFinder is an effective supplementary tool for binary analysis. BCFinder is, by far, the first lightweight, rapid and platform-independent tool to detect component reuse in binaries against a large-scale data base like Github.
What problem does this paper attempt to address?