LibDX: A Cross-Platform and Accurate System to Detect Third-Party Libraries in Binary Code

Wei Tang,Ping Luo,Jialiang Fu,Dan Zhang
DOI: https://doi.org/10.1109/saner48275.2020.9054845
2020-01-01
Abstract:With the development of the open-source movement, third-party library reuse is commonly practiced in programming. Application developers can reuse the code to save time and development costs. However, there are some hidden risks in misusing third-party libraries such as license violation and security vulnerability. The identification of libraries written in C or C++ is impeded by compilation process which hides most features of code. The same open-source package can be compiled into different binary code by different compilation processes. Therefore, this paper proposes LibDX, a platform-independent and fully-automated system, to detect reused libraries in binary files. With a well-designed feature extractor, LibDX can overcome compilation diversity between binary files. LibDX novelly introduces the logic feature block concept which is applied to deal with the feature duplication challenge in a large-scale feature database. We built a large test data set covering multiple platforms and evaluated LibDX with 9.5K packages including 25.8K C/C++ binary files. Our results show that LibDX achieves a precision of 92% and a recall of 97%, and outperforms state-of-the-art tools. We have validated the performance of the system with closed source commercial applications and found some license violation cases.
What problem does this paper attempt to address?