BBDetector: A Precise and Scalable Third-Party Library Detection in Binary Executables with Fine-Grained Function-Level Features

Xiaoya Zhu,Junfeng Wang,Zhiyang Fang,Xiaokang Yin,Shengli Liu
DOI: https://doi.org/10.3390/app13010413
2022-01-01
Applied Sciences
Abstract:Third-party library (TPL) reuse may introduce vulnerable or malicious code and expose the software, which exposes them to potential risks. Thus, it is essential to identify third-party dependencies and take immediate corrective action to fix critical vulnerabilities when a damaged reusable component is found or reported. However, most of the existing methods only rely on syntactic features, which results in low recognition accuracy and significantly discounts the detection performance by obfuscation techniques. In addition, a few semantic-based approaches face the efficiency problem. To resolve these problems, we propose and implement a more precise and scalable TPL detection method BBDetector. In addition to syntactic features, we consider the rich function-level semantic features and form a feature vector for each function. Moreover, we design a scalable function vector similarity search method to identify anchor functions and the candidate libraries, based upon which we carry out TPL detection. The experiment results demonstrate that BBDetector outperforms B2SFinder and ModX in terms of effectiveness, efficiency, and obfuscation-resilient capability. For the nix binaries, the F1-score of BBDetector is 1.11% and 11.21% higher than that of ModX and B2SFinder, respectively. Moreover, for the Ubuntu binaries, the F1-score of BBDetector is 1.32% and 14.93% is higher than that of ModX and B2SFinder, respectively. And in terms of efficiency, the detection time of BBDetector is only 30.02% of ModX. Besides, for the obfuscation-resilient capability, BBDetector is much stronger than B2SFinder. BBDetector achieves a F1-score of 71%, slightly lower than the F1-score of 77% achieved with the non-obfuscated binary programs. However, B2SFinder only achieves an F1-score of 28%, much lower than that of 67% achieved with the non-obfuscated binary programs.
What problem does this paper attempt to address?